Yonadav Shavit comments on The Speed + Simplicity Prior is probably anti-deceptive

Yonadav Shavit 28 Apr 2022 22:56 UTC
3 points
I very much do not believe that a mesaoptimizer found by gradient descent would look anything like the above Python programs. I’m just using this as a simplification to try and get at trends that I think it represents.
Re: (1) my argument is exactly whether gradient descent would summon an agent with a weird pseudogoal Y that was not itself a proxy for reward on its training distribution. If pursuing Y directly (which is different from the base optimizer goal, e.g. Z)
I’m realizing some of the confusion might be because I named the goal-finding function “get_base_obj” instead of “get_proxy_for_base_obj”. That seems like it would definitely mislead people, I’ll fix that.