Some Nuance on Learned Optimisation in the Real World
I think mesa-optimisers should not be thought of as learned optimisers, but systems that employ optimisation/search as part of their inference process.
The simplest case is that pure optimisation during inference is computationally intractable in rich environments (e.g. the real world), so systems (e.g. humans) operating in the real world, do not perform inference solely by directly optimising over outputs.
Rather optimisation is employed sometimes as one part of their inference strategy. That is systems only optimise their outputs part of the time (other [most?] times they execute learned heuristics[1]).
Furthermore, learned optimisation in the real world seems to be more “local”/task specific (i.e. I make plans to achieve local, particular objectives [e.g.planning a trip from London to Edinburgh]. I have no global objective that I am consistently optimising for over the duration of my lifetime).
I think this is basically true for any feasible real world intelligent system[2]. So learned optimisation in the real world is:
This seems equivalent to Beren’s concept of “hybrid optimisation”; I mostly use “partial optimisation”, because it feels closer to the ontology of the Risks From Learned Optimisation paper. As they define optimisation, I think learned algorithms operating in the real world just will not be consistently optimising for any global objective.
One can always reparameterize any given input / output mapping as a search for the minima of some internal energy function, without changing the mapping at all.
The main criteria to think about is whether an agent will use creative, original strategies to maximize inner objectives, strategies which are more easily predicted by assuming the agent is “deliberately” looking for extremes of the inner objectives, as opposed to basing such predictions on the agent’s past actions, e.g., “gather more computational resources so I can find a high maximum”.
Given that the optimisation performed by intelligent systems in the real world is local/task specific, I’m wondering if it would be more sensible to model the learned model as containing (multiple) mesa-optimisers rather than being a single mesa-optimiser.
My main reservation is that I think this may promote a different kind of confused thinking; it’s not the case that the learned optimisers are constantly competing for influence and their aggregate behaviour determines the overall behaviour of the learned algorithm. Rather the learned algorithm employs optimisation towards different local/task specific objectives.
Some Nuance on Learned Optimisation in the Real World
I think mesa-optimisers should not be thought of as learned optimisers, but systems that employ optimisation/search as part of their inference process.
The simplest case is that pure optimisation during inference is computationally intractable in rich environments (e.g. the real world), so systems (e.g. humans) operating in the real world, do not perform inference solely by directly optimising over outputs.
Rather optimisation is employed sometimes as one part of their inference strategy. That is systems only optimise their outputs part of the time (other [most?] times they execute learned heuristics[1]).
Furthermore, learned optimisation in the real world seems to be more “local”/task specific (i.e. I make plans to achieve local, particular objectives [e.g.planning a trip from London to Edinburgh]. I have no global objective that I am consistently optimising for over the duration of my lifetime).
I think this is basically true for any feasible real world intelligent system[2]. So learned optimisation in the real world is:
Partial[3]
Local
Do these nuances of real world mesa-optimisers change the nature of risks from learned optimisation?
Cc: @evhub, @beren, @TurnTrout, @Quintin Pope.
Though optimisation (e.g. planning) might sometimes be employed to figure out which heuristic to deploy at a particular time.
For roughly the reasons why I think fixed immutable terminal goals are antinatural, see e.g.: “Is “Strong Coherence” Anti-Natural?”
Alternatively, I believe that real world systems learn contextual heuristics (downstream of historical selection) that influence decision making (“values”) and not fixed/immutable terminal “goals”. See also: “why assume AGIs will optimize for fixed goals?”
This seems equivalent to Beren’s concept of “hybrid optimisation”; I mostly use “partial optimisation”, because it feels closer to the ontology of the Risks From Learned Optimisation paper. As they define optimisation, I think learned algorithms operating in the real world just will not be consistently optimising for any global objective.
One can always reparameterize any given input / output mapping as a search for the minima of some internal energy function, without changing the mapping at all.
The main criteria to think about is whether an agent will use creative, original strategies to maximize inner objectives, strategies which are more easily predicted by assuming the agent is “deliberately” looking for extremes of the inner objectives, as opposed to basing such predictions on the agent’s past actions, e.g., “gather more computational resources so I can find a high maximum”.
Given that the optimisation performed by intelligent systems in the real world is local/task specific, I’m wondering if it would be more sensible to model the learned model as containing (multiple) mesa-optimisers rather than being a single mesa-optimiser.
My main reservation is that I think this may promote a different kind of confused thinking; it’s not the case that the learned optimisers are constantly competing for influence and their aggregate behaviour determines the overall behaviour of the learned algorithm. Rather the learned algorithm employs optimisation towards different local/task specific objectives.