johnswentworth comments on A Correspondence Theorem in the Maximum Entropy Framework

johnswentworth 13 Nov 2020 18:10 UTC
LW: 4 AF: 2
AF
I basically agree with what you’re saying about policy implications. What I want to say is more like “if we actually tried high-level interventions X and Y, and empirically X worked better for high-level success metric Z, then that should still be true under the new model, with a lower-level grounding of X, Y and Z”. It’s still possible that an old model incorrectly predicts which of X and Y work better empirically, which would mean that the old model has worse predictive performance. Similarly: if the old model predicts that X is the optimal action, then the new model should still predict that, to the extent that the old model successfully predicts the world. If the new model is making different policy recommendations, then those should be tied to some place where the old model had inferior predictive power.
This seems to me like the sort of thing one should think about when desigining an AI one hopes to align.
This is not obvious to me. Can you explain the reasoning and/or try to convey the intuition?
- Daniel Kokotajlo 14 Nov 2020 7:31 UTC
  LW: 4 AF: 2
  AF Parent
  OK, sounds good.
  I’m not sure either, but it seems true to me. Here goes intuition-conveying attempt… First, the question of what counts as your data seems like a parameter that must be pinned down one way or another, and as you mention there are clearly wrong ways to do it, and meanwhile it’s an open philosophical controversy, so on those grounds alone it seems plausibly relevant to building an aligned AI, at least if we are doing it in a principled way rather than through prosaic (i.e. we do an automated search for it) methods. Second, one’s views on what sorts of theories fit the data depend on what you think your data is. Disputes about consciousness often come down to this, I think. If you want your AI to be physicalist rather than idealist or cartesian dualist, you need to give it the corresponding notion of data. And what kind of physicalist? Etc. Or you might want it to be uncertain and engage in philosophical reasoning about what counts as its data… which sounds like also something one has to think about, it doesn’t come for free when building an AI. (It does come for free if you are searching for an AI)