Signer comments on The Speed + Simplicity Prior is probably anti-deceptive

Signer 28 Apr 2022 19:40 UTC
2 points
Oh, yes, I actually missed that this was not supposed to solve misaligned mesaoptimizers because of “well-aligned” in “Fast Honest Mesaoptimizer: The AI wakes up in a new environment, and proceeds to optimize a proxy that is well-aligned with the environment’s current objective.”. Rechecking with new context… Well, not sure if it’s new context, but I now see that optimizer with check that derives what humans want should be the same as honest one if the check is never satisfied and so it would have at least the same complexity, which I neglected because I didn’t think what “it proceeds to optimize the objective it’s supposed to” means in detail. So you’re right, it’s either slower or more complex.
- Yonadav Shavit 28 Apr 2022 21:04 UTC
  2 points
  Parent
  Edited! Thanks for this discussion.