Daniel Kokotajlo comments on A Correspondence Theorem in the Maximum Entropy Framework

Daniel Kokotajlo 12 Nov 2020 11:19 UTC
LW: 6 AF: 3
AF
This might be a good time to talk about different ways “it all adds up to normality” is interpreted.
I sometimes hear people use it in a stronger sense, to mean not just that the new theory must make the same successful predictions but that also the policy implications are mostly the same. E.g. “Many worlds has to add up to normality, so one way or another it still makes sense for us to worry about death, try to prevent suffering, etc.” Correct me if I’m wrong, but this sort of thing isn’t entailed by your proof, right?
There’s also the issue of what counts as the data that the new theory needs to correctly predict. Some people think that “This is a table, damn it! Not a simulated table!” is part of their data that theories need to account for. What do you say to them?
- johnswentworth 12 Nov 2020 17:24 UTC
  LW: 4 AF: 2
  AF Parent
  Good questions.
  On policy implications, I see two different types of claim there. One is something like “if the best way to achieve X was Y, then that should still hold”. In terms of abstraction: if both the goal X and the possible actions Y are defined at a high level of abstraction, and action Y is optimal in the high-level model, then any low-level model which abstracts into the high-level model should also predict that Y (or the low-level thing corresponding to Y) is optimal. Something roughly-like-that could be implied by a proof roughly-like-this, depending on exactly how we’re “learning” behavior under interventions/counterfactuals.
  The other type of claim is something like “if X was morally right before, then X should still be morally right with the new world-model”. Whether this is “entailed” by this sort of proof depends on what assumptions about morality we’re bringing to the table. As a general rule, I try to avoid directly thinking about morality at all. I think about what I want (in the “what I wish the world were like” sense, not in the “I want to eat right now” sense). If I’m building an AI (or helping build one, or causing one to be built, etc) then “what do I want the world to look like?” is the relevant question to ask, and “morality”—however it’s defined—is relevant only insofar as it influences that question. So that mostly brings us back to the previous type of claim.
  As for this:
  Some people think that “This is a table, damn it! Not a simulated table!” is part of their data that theories need to account for. What do you say to them?
  I mostly ignore them. It is not an objection which needs to be addressed in order to build an AI, or model biological or economic systems, or any of the other things I actually care about doing with the theory.
  - Daniel Kokotajlo 13 Nov 2020 8:59 UTC
    LW: 4 AF: 2
    AF Parent
    On policy implications: I think that the new theory almost always generates at least some policy implications. For example, relativity vs. newton changes how we design rockets and satellites. Closer to home, multiverse theory opens up the possibility of (some kinds of) acausal trade. I think “it all adds up to normality” is something that shouldn’t be used to convince yourself that a new theory probably has the same implications; rather, it’s something that should be used to convince yourself that the new theory is incorrect, if it seems to add up to something extremely far from normal, like paralysis or fanaticism. If it adds up to something non-normal but not that non-normal, then it’s fine.
    I brought up those people as an example of someone you probably disagree with. My purpose was to highlight that choices need to be made about what your data is, and different people make them differently. (For an example closer to home, solomonoff induction makes it differently than you do, I predict) This seems to me like the sort of thing one should think about when desigining an AI one hopes to align. Obviously if you are just going for capabilities rather than alignment you can probably get away with not thinking hard about this question.
    - johnswentworth 13 Nov 2020 18:10 UTC
      LW: 4 AF: 2
      AF Parent
      I basically agree with what you’re saying about policy implications. What I want to say is more like “if we actually tried high-level interventions X and Y, and empirically X worked better for high-level success metric Z, then that should still be true under the new model, with a lower-level grounding of X, Y and Z”. It’s still possible that an old model incorrectly predicts which of X and Y work better empirically, which would mean that the old model has worse predictive performance. Similarly: if the old model predicts that X is the optimal action, then the new model should still predict that, to the extent that the old model successfully predicts the world. If the new model is making different policy recommendations, then those should be tied to some place where the old model had inferior predictive power.
      This seems to me like the sort of thing one should think about when desigining an AI one hopes to align.
      This is not obvious to me. Can you explain the reasoning and/or try to convey the intuition?
      - Daniel Kokotajlo 14 Nov 2020 7:31 UTC
        LW: 4 AF: 2
        AF Parent
        OK, sounds good.
        I’m not sure either, but it seems true to me. Here goes intuition-conveying attempt… First, the question of what counts as your data seems like a parameter that must be pinned down one way or another, and as you mention there are clearly wrong ways to do it, and meanwhile it’s an open philosophical controversy, so on those grounds alone it seems plausibly relevant to building an aligned AI, at least if we are doing it in a principled way rather than through prosaic (i.e. we do an automated search for it) methods. Second, one’s views on what sorts of theories fit the data depend on what you think your data is. Disputes about consciousness often come down to this, I think. If you want your AI to be physicalist rather than idealist or cartesian dualist, you need to give it the corresponding notion of data. And what kind of physicalist? Etc. Or you might want it to be uncertain and engage in philosophical reasoning about what counts as its data… which sounds like also something one has to think about, it doesn’t come for free when building an AI. (It does come for free if you are searching for an AI)