BTW with regard to “studying mesa-optimization in the context of such systems”, I just published this post: Why GPT wants to mesa-optimize & how we might change this.
I’m still thinking about the point you made in the other subthread about MAML. It seems very plausible to me that GPT is doing MAML type stuff. I’m still thinking about if/how that could result in dangerous mesa-optimization.
BTW with regard to “studying mesa-optimization in the context of such systems”, I just published this post: Why GPT wants to mesa-optimize & how we might change this.
I’m still thinking about the point you made in the other subthread about MAML. It seems very plausible to me that GPT is doing MAML type stuff. I’m still thinking about if/how that could result in dangerous mesa-optimization.