DragonGod comments on DragonGod’s Shortform

DragonGod Apr 18, 2023, 3:51 PM
4 points
Consequentialism is in the Stars not Ourselves?
Still thinking about consequentialism and optimisation. I’ve argued that global optimisation for an objective function is so computationally intractable as to be prohibited by the laws of physics of our universe. Yet it’s clearly the case that e.g. evolution is globally optimising for inclusive genetic fitness (or perhaps patterns that more successfully propagate themselves if you’re taking a broader view). I think examining why evolution is able to successfully globally optimise for its objective function would be enlightening.
Using the learned optimisation ontology, we have an outer selection process (evolution, stochastic gradient descent, etc.) that selects intelligent systems according to their performance on a given metric (inclusive genetic fitness and loss respectively).
Local vs Global Optimisation
Optimisation here refers to “direct” optimisation, a mechanistic procedure for internally searching through an appropriate space for elements that maximise or minimise the value of some objective function defined on that space.
Local Optimisation
- Involves deploying optimisation (search, planning, etc.) to accomplish specific tasks (e.g., making a good move in chess, winning a chess game, planning a trip, solving a puzzle).
- The choice of local tasks is not determined as part of this framework; local tasks could be subproblems of a another optimisation problem (e.g. picking a good next move as part of winning a chess game), generated via learned heuristics, etc.
Global Optimisation
- Entails consistently employing optimisation throughout a system’s active lifetime to achieve fixed terminal goals.
- All actions flow from their expected consequences on realising the terminal goals (e.g., if a terminal goal is to maximise the number of lives saved, every activity—eating, sleeping, playing, working—is performed because it is the most tractable way to maximise the expected number of future lives saved at that point in time).
Outer Optimisation Processes as Global Optimisers
As best as I can tell, there are some distinctive features of outer optimisation processes that facilitate global optimisation:
Access to more compute power
- ML algorithms are trained with significantly (often orders of magnitude) more compute than is used for running inference due in part to economic incentives
  - Economic incentives favour this: centralisation of ML training allows training ML models on bespoke hardware in massive data centres, but the models need to be cheap enough to run profitably
    Optimising inference costs has lead to “overtraining” smaller models
  - In some cases trained models are intended to be run on consumer hardware or edge computing devices
- Evolutionary processes have access to the cumulative compute power of the entire population under selection, and they play out across many generations of the population
- This (much) greater compute allows outer optimisation processes to apply (many?) more bits of selection towards their objective functions
Relaxation of time constraints
- Real-time inference imposes a strict bound on how much computation can be performed in a single time step
  - Robotics, self driving cars, game AIs, etc. must make actions within fractions of a second
    Sometimes hundreds of actions in a second
  - User facing cognitive models (e.g.) LLMs are also subject to latency constraints
    Though people may be more willing to wait longer for responses if the output of the models are sufficiently better
- In contrast, the outer selection process just has a lot more time to perform optimisation
  - ML training runs already last several months, and the only bound on length of training runs seems to be hardware obsolescence
    For sufficiently long training runs, it becomes better to wait for the next hardware generation before starting training
    Training runs exceeding a year seem possible eventually, especially if loss keeps going down with scale
  - Evolution occurs over timescales of hundreds to thousands of generations of an organism
Solving a (much) simpler optimisation problem
- Outer optimisation processes evaluate the objective function by using actual consequences along single trajectories for selection, as opposed to modeling expected consequences across multiple future trajectories and searching for trajectories with better expected consequences.
  - Evaluating future consequences of actions is difficult (e.g., what is the expected value of writing this LessWrong shortform on the number of future lives saved?)
  - Chaos sharply limits how far into the future we can meaningfully predict (regardless of how much computational resources one has), which is not an issue when using actual consequences for selection
    In a sense, outer optimisation processes get the “evaluate consequences of this trajectory on the objective” for free, and that’s just a very difficult (and in some cases outright intractable) computational problem
  - The usage of actual consequences applies over longer time horizons
    Evolution has a potentially indefinite/unbounded horizon
    And has been optimising for much longer than any
    Current ML training generally operates with fixed-length horizons but uses actual/exact consequences of trajectories over said horizons.
- Outer optimisation processes selects for a policy that performs well according to the objective function on the training distribution, rather than selecting actions that optimise an objective function directly in deployment.
  - This approach amortises the cost of optimization across many future inferences of the selected policy.
Summary
Outer optimisation processes are more capable of global optimisation due to their access to more compute power, relaxed time constraints, and just generally facing a much simpler optimisation problem (evaluations of exact consequences are provided for free [and over longer time horizons], amortisation of optimisation costs, etc).
These factors enable outer optimisation processes to globally optimise for their selection metric in a way that is infeasible for the intelligent systems they select for.
Cc: @beren, @tailcalled, @Chris_Leong, @JustisMills.
What links here?
- Consequentialism is in the Stars not Ourselves by DragonGod (Apr 24, 2023, 12:02 AM; 7 points)

DragonGod comments on DragonGod’s Shortform

Consequentialism is in the Stars not Ourselves?

Local vs Global Optimisation

Local Optimisation

Global Optimisation

Outer Optimisation Processes as Global Optimisers

Access to more compute power

Relaxation of time constraints

Summary