Oh, yes, I actually missed that this was not supposed to solve misaligned mesaoptimizers because of “well-aligned” in “Fast Honest Mesaoptimizer: The AI wakes up in a new environment, and proceeds to optimize a proxy that is well-aligned with the environment’s current objective.”. Rechecking with new context… Well, not sure if it’s new context, but I now see that optimizer with check that derives what humans want should be the same as honest one if the check is never satisfied and so it would have at least the same complexity, which I neglected because I didn’t think what “it proceeds to optimize the objective it’s supposed to” means in detail. So you’re right, it’s either slower or more complex.
Oh, yes, I actually missed that this was not supposed to solve misaligned mesaoptimizers because of “well-aligned” in “Fast Honest Mesaoptimizer: The AI wakes up in a new environment, and proceeds to optimize a proxy that is well-aligned with the environment’s current objective.”. Rechecking with new context… Well, not sure if it’s new context, but I now see that optimizer with check that derives what humans want should be the same as honest one if the check is never satisfied and so it would have at least the same complexity, which I neglected because I didn’t think what “it proceeds to optimize the objective it’s supposed to” means in detail. So you’re right, it’s either slower or more complex.
Edited! Thanks for this discussion.