Using the learned optimisation ontology, we have an outer selection process (evolution, stochastic gradient descent, etc.) that selects intelligent systems according to their performance on a given metric (inclusive genetic fitness and loss respectively).
Local vs Global Optimisation
Optimisation here refers to “direct” optimisation, a mechanistic procedure for internally searching through an appropriate space for elements that maximise or minimise the value of some objective function defined on that space.
Local Optimisation
Involves deploying optimisation (search, planning, etc.) to accomplish specific tasks (e.g., making a good move in chess, winning a chess game, planning a trip, solving a puzzle).
The choice of local tasks is not determined as part of this framework; local tasks could be subproblems of a another optimisation problem (e.g. picking a good next move as part of winning a chess game), generated via learned heuristics, etc.
Global Optimisation
Entails consistently employing optimisation throughout a system’s active lifetime to achieve fixed terminal goals.
All actions flow from their expected consequences on realising the terminal goals (e.g., if a terminal goal is to maximise the number of lives saved, every activity—eating, sleeping, playing, working—is performed because it is the most tractable way to maximise the expected number of future lives saved at that point in time).
Outer Optimisation Processes as Global Optimisers
As best as I can tell, there are some distinctive features of outer optimisation processes that facilitate global optimisation:
Access to more compute power
ML algorithms are trained with significantly (often orders of magnitude) more compute than is used for running inference due in part to economic incentives
Economic incentives favour this: centralisation of ML training allows training ML models on bespoke hardware in massive data centres, but the models need to be cheap enough to run profitably
Optimising inference costs has lead to “overtraining” smaller models
In some cases trained models are intended to be run on consumer hardware or edge computing devices
Evolutionary processes have access to the cumulative compute power of the entire population under selection, and they play out across many generations of the population
This (much) greater compute allows outer optimisation processes to apply (many?) more bits of selection towards their objective functions
Relaxation of time constraints
Real-time inference imposes a strict bound on how much computation can be performed in a single time step
Robotics, self driving cars, game AIs, etc. must make actions within fractions of a second
Sometimes hundreds of actions in a second
User facing cognitive models (e.g.) LLMs are also subject to latency constraints
Though people may be more willing to wait longer for responses if the output of the models are sufficiently better
In contrast, the outer selection process just has a lot more time to perform optimisation
ML training runs already last several months, and the only bound on length of training runs seems to be hardware obsolescence
For sufficiently long training runs, it becomes better to wait for the next hardware generation before starting training
Training runs exceeding a year seem possible eventually, especially if loss keeps going down with scale
Evolution occurs over timescales of hundreds to thousands of generations of an organism
Solving a (much) simpler optimisation problem
Outer optimisation processes evaluate the objective function by using actual consequences along single trajectories for selection, as opposed to modeling expected consequences across multiple future trajectories and searching for trajectories with better expected consequences.
Evaluating future consequences of actions is difficult (e.g., what is the expected value of writing this LessWrong shortform on the number of future lives saved?)
In a sense, outer optimisation processes get the “evaluate consequences of this trajectory on the objective” for free, and that’s just a very difficult (and in some cases outright intractable) computational problem
The usage of actual consequences applies over longer time horizons
Evolution has a potentially indefinite/unbounded horizon
And has been optimising for much longer than any
Current ML training generally operates with fixed-length horizons but uses actual/exact consequences of trajectories over said horizons.
Outer optimisation processes selects for a policy that performs well according to the objective function on the training distribution, rather than selecting actions that optimise an objective function directly in deployment.
Outer optimisation processes are more capable of global optimisation due to their access to more compute power, relaxed time constraints, and just generally facing a much simpler optimisation problem (evaluations of exact consequences are provided for free [and over longer time horizons], amortisation of optimisation costs, etc).
These factors enable outer optimisation processes to globally optimise for their selection metric in a way that is infeasible for the intelligent systems they select for.
Consequentialism is in the Stars not Ourselves?
Still thinking about consequentialism and optimisation. I’ve argued that global optimisation for an objective function is so computationally intractable as to be prohibited by the laws of physics of our universe. Yet it’s clearly the case that e.g. evolution is globally optimising for inclusive genetic fitness (or perhaps patterns that more successfully propagate themselves if you’re taking a broader view). I think examining why evolution is able to successfully globally optimise for its objective function would be enlightening.
Using the learned optimisation ontology, we have an outer selection process (evolution, stochastic gradient descent, etc.) that selects intelligent systems according to their performance on a given metric (inclusive genetic fitness and loss respectively).
Local vs Global Optimisation
Optimisation here refers to “direct” optimisation, a mechanistic procedure for internally searching through an appropriate space for elements that maximise or minimise the value of some objective function defined on that space.
Local Optimisation
Involves deploying optimisation (search, planning, etc.) to accomplish specific tasks (e.g., making a good move in chess, winning a chess game, planning a trip, solving a puzzle).
The choice of local tasks is not determined as part of this framework; local tasks could be subproblems of a another optimisation problem (e.g. picking a good next move as part of winning a chess game), generated via learned heuristics, etc.
Global Optimisation
Entails consistently employing optimisation throughout a system’s active lifetime to achieve fixed terminal goals.
All actions flow from their expected consequences on realising the terminal goals (e.g., if a terminal goal is to maximise the number of lives saved, every activity—eating, sleeping, playing, working—is performed because it is the most tractable way to maximise the expected number of future lives saved at that point in time).
Outer Optimisation Processes as Global Optimisers
As best as I can tell, there are some distinctive features of outer optimisation processes that facilitate global optimisation:
Access to more compute power
ML algorithms are trained with significantly (often orders of magnitude) more compute than is used for running inference due in part to economic incentives
Economic incentives favour this: centralisation of ML training allows training ML models on bespoke hardware in massive data centres, but the models need to be cheap enough to run profitably
Optimising inference costs has lead to “overtraining” smaller models
In some cases trained models are intended to be run on consumer hardware or edge computing devices
Evolutionary processes have access to the cumulative compute power of the entire population under selection, and they play out across many generations of the population
This (much) greater compute allows outer optimisation processes to apply (many?) more bits of selection towards their objective functions
Relaxation of time constraints
Real-time inference imposes a strict bound on how much computation can be performed in a single time step
Robotics, self driving cars, game AIs, etc. must make actions within fractions of a second
Sometimes hundreds of actions in a second
User facing cognitive models (e.g.) LLMs are also subject to latency constraints
Though people may be more willing to wait longer for responses if the output of the models are sufficiently better
In contrast, the outer selection process just has a lot more time to perform optimisation
ML training runs already last several months, and the only bound on length of training runs seems to be hardware obsolescence
For sufficiently long training runs, it becomes better to wait for the next hardware generation before starting training
Training runs exceeding a year seem possible eventually, especially if loss keeps going down with scale
Evolution occurs over timescales of hundreds to thousands of generations of an organism
Solving a (much) simpler optimisation problem
Outer optimisation processes evaluate the objective function by using actual consequences along single trajectories for selection, as opposed to modeling expected consequences across multiple future trajectories and searching for trajectories with better expected consequences.
Evaluating future consequences of actions is difficult (e.g., what is the expected value of writing this LessWrong shortform on the number of future lives saved?)
Chaos sharply limits how far into the future we can meaningfully predict (regardless of how much computational resources one has), which is not an issue when using actual consequences for selection
In a sense, outer optimisation processes get the “evaluate consequences of this trajectory on the objective” for free, and that’s just a very difficult (and in some cases outright intractable) computational problem
The usage of actual consequences applies over longer time horizons
Evolution has a potentially indefinite/unbounded horizon
And has been optimising for much longer than any
Current ML training generally operates with fixed-length horizons but uses actual/exact consequences of trajectories over said horizons.
Outer optimisation processes selects for a policy that performs well according to the objective function on the training distribution, rather than selecting actions that optimise an objective function directly in deployment.
This approach amortises the cost of optimization across many future inferences of the selected policy.
Summary
Outer optimisation processes are more capable of global optimisation due to their access to more compute power, relaxed time constraints, and just generally facing a much simpler optimisation problem (evaluations of exact consequences are provided for free [and over longer time horizons], amortisation of optimisation costs, etc).
These factors enable outer optimisation processes to globally optimise for their selection metric in a way that is infeasible for the intelligent systems they select for.
Cc: @beren, @tailcalled, @Chris_Leong, @JustisMills.