Sounds like a job for quantum machine learning :P (See e.g. figure 1 here)
But actually, I’m mostly just skeptical that this really works in high-dimensional space with no access to the actual parameters of the model. You make some intuitive arguments that I feel like rely on low-dimensional intuitions about how easy it is to create local minima rather than saddle points. E.g. in sufficiently high dimensions we might simply find that there’s some direction in parameter-space that directly changes what the agent thinks is the Schelling point in mesa-objective space, thus collapsing the entire class of attempts at building the bumpy landscape.
Sounds like a job for quantum machine learning :P (See e.g. figure 1 here)
But actually, I’m mostly just skeptical that this really works in high-dimensional space with no access to the actual parameters of the model. You make some intuitive arguments that I feel like rely on low-dimensional intuitions about how easy it is to create local minima rather than saddle points. E.g. in sufficiently high dimensions we might simply find that there’s some direction in parameter-space that directly changes what the agent thinks is the Schelling point in mesa-objective space, thus collapsing the entire class of attempts at building the bumpy landscape.