Consequentialism is in the Stars not Ourselves
Polished from my shortform.
Epistemic Status
Thinking out loud.
Introduction
I’ve argued that system wide/total optimisation for an objective function in the real world is so computationally intractable as to be prohibited by the laws of physics of our universe[1]. Yet it’s clearly the case that e.g., evolution is optimising for inclusive genetic fitness (or perhaps patterns that more successfully propagate themselves if you’re taking a broader view) in such a totalising manner. I think examining why evolution is able to successfully totally optimise for its objective function would be enlightening.
Using the learned optimisation ontology, we have an outer selection process (evolution, stochastic gradient descent, etc.) that selects intelligent systems according to their performance on a given metric (inclusive genetic fitness and loss respectively).
Optimisation
Behavioural (Descriptive) Optimisation
I think of behavioural optimisation as something along the general lines of:
Navigating through a state space to improbable regions that are extremal values of some compactly specifiable (non-trivial) objective function[2].
Mechanistic (Prescriptive) Optimisation
I think of mechanistic optimisation as something along the general lines of:
a procedure that internally searches through an appropriate space for elements that maximise or minimise the value of some objective function defined on that space].
“Direct” optimisation in the ontology introduced by @beren.
Notably, the procedure must actually evaluate[3] the objective function (or the expected value thereof) on elements of the search space.
Mechanistic optimisation is implementing an optimisation algorithm.
For the rest of this post — unless otherwise stated — I’ll be using “optimisation”/”optimising” to refer to “mechanistic optimisation”.
“Scope” of Optimisation[4]
I want to distinguish optimising systems according to the “scope” of the optimisation procedure(s) in the system’s policy[5].
“Partial” (Task Specific) Optimisation
Involves deploying optimisation (search, planning, etc.) to accomplish specific tasks (e.g., making a good next move in chess, winning a chess game, planning a trip, solving a puzzle).
The choice of particular tasks is not determined as part of this framework; tasks could be subproblems of another optimisation problem (e.g., picking a good next move as part of winning a chess game), generated via heuristics, etc.
The system’s policy is not a coherent optimiser, but contains optimisation subprocedures that are applied to specific tasks the system encounters
Optimisation is but one tool in the system’s toolbox
“Total” Optimisation
Entails consistently employing optimisation throughout a system’s active lifetime to achieve fixed terminal goals.
All actions/outputs flow from their expected consequences on realising the terminal goals (e.g., if a terminal goal is to maximise the number of lives saved, every activity—eating, sleeping, playing, working—is performed because it is the most tractable way to maximise the expected number of future lives saved at that point in time).
The system’s entire policy is in effect an optimisation algorithm with a set of objectives it is coherently optimising for.
Outer Optimisation Processes as Total Optimisers
As best as I can tell, there are some distinctive features of outer optimisation processes that facilitate total optimisation:
Access to more compute power
ML algorithms are trained with significantly (often orders of magnitude) more compute than is used for running inference due in part to economic incentives
Centralisation of ML training allows training ML models on bespoke hardware in massive data centres, but the models need to be cheap enough to run profitably
Optimising inference costs has led to “overtraining” (per the Chinchilla scaling laws) smaller models (e.g. LLaMA)
In some cases, trained models are intended to be run on consumer hardware or edge computing devices, so there is a many orders of magnitude gap between the computing available for inference and the computing available for training
Evolutionary processes have access to the cumulative compute power of the entire population under selection, and they play out across many generations of the population
This (much) greater compute allows outer optimisation processes to apply (many?) more bits of selection towards their objective functions
Relaxation of time constraints
Real-time inference imposes a strict bound on how much computation can be performed in a single time step
Robotics, self-driving cars, game AIs, etc. must make actions within fractions of a second
Sometimes hundreds of actions in a second
User facing cognitive models (e.g.) LLMs are also subject to latency constraints
Though people may be more willing to wait longer for responses if the output of the models are sufficiently better
In contrast, the outer selection process just has a lot more time to perform optimisation
ML training runs already last several months, and the only bound on length of training runs seems to be hardware obsolescence
For sufficiently long training runs, it becomes better to wait for the next hardware generation before starting training
Training runs exceeding a year seem possible eventually, especially if loss keeps going down with scale
Evolution occurs over timescales of hundreds to thousands of generations of an organism
Solving a (much) simpler optimisation problem
Outer optimisation processes evaluate the objective function by using actual consequences along particular state-action trajectories for selection, as opposed to modeling expected consequences across multiple future trajectories and searching for trajectories with better expected consequences.
Evaluating future consequences of actions is difficult
E.g., what is the expected value of writing this LessWrong shortform on the number of future lives saved?
Alternatively, if humans were totally optimising for inclusive genetic fitness, children would need to model the consequences of stubbing their toes on their future reproductive viability and avoid it as a result, vs avoiding stubbing their toes because its painful.
Chaos sharply limits how far into the future we can meaningfully predict (regardless of how much computational resources one has), which is not an issue when using actual consequences for selection
In a sense, outer optimisation processes get the “evaluate consequences of this trajectory on the objective” for free, and that’s just a very difficult (and in some cases outright intractable) computational problem
It’s hard to effectively search across future trajectories for the best next action if you can’t readily/accurately model the consequences of said trajectories
Or are sharply limited in the length of trajectories you can consider
The usage of actual consequences applies over longer time horizons
Evolution has a potentially indefinite/unbounded horizon
And has been optimising for much longer than the lifespan of any organism
Current ML training generally operates with fixed-length horizons but uses actual/exact consequences of trajectories over said horizons.
Outer optimisation processes select for a policy that performs well according to the objective function on the training distribution, rather than selecting actions that optimise an objective function directly in deployment.
Access to vastly more data further facilitates learning a suitable policy
Evolution for example has access to many orders of magnitude more data points with which to select a suitable policy than any given organism observes in its lifetime
Current ML models mostly learn in a largely offline fashion, but to the extent that in context learning for LLMs is analogous to within lifetime learning for animals, LLMs were pretrained with again many orders of magnitude more tokens than they see in a given context window (trillions of tokens for pretraining vs a few thousand tokens for the context window)
Summary
Outer optimisation processes are more capable of total optimisation due to their access to more compute power, relaxed time constraints, and just generally facing a much simpler optimisation problem (evaluations of exact consequences are provided for free [and over longer time horizons], amortisation of optimisation costs, etc).
These factors enable outer optimisation processes to totally optimise for an objective function (their selection metric) in a way that is infeasible for the intelligent systems they select for.
Tentative Conclusions
I’m updating towards “powerful optimisation process” being the wrong way to think about intelligent systems.
While it is true that intelligent systems do some direct optimisation as part of their cognition, reasoning about them as purely direct optimisers seems like it would lead to (importantly) wrong inferences about their out of distribution/generalisation behaviour and what the “converge” to as they are amplified or subject to further selection pressure.
Most human cognition isn’t mechanistically consequentialist, and that isn’t coincidence or happenstance; mechanistic consequentialism is just an incredibly expensive way to do inference, and the time/compute constraints prohibit it in most circumstances. In particular, it’s just much easier to delegate the work of evaluating long term consequences of trajectories to the outer selection process (e.g. stubbed toes lower future reproductive potential), which can select contextual heuristics that are performant in the environment a system was adapted to (e.g. avoid things that are painful).
A lot of the “work” done[6] by (mechanistic) consequentialism happens in the outer selection process that produced a system, not in the system so selected.
And it can’t really be any other way[7].
Cc: @beren, @tailcalled, @Chris_Leong, @JustisMills.
- ^
Note that total optimisation in simple environments (e.g. tic-tac-toe, chess, go) is more computationally tractable (albeit to varying degrees)
- ^
For a compactly specifiable nontrivial objective function.
“Compactly specifiable”: any system is behaviourally optimising for the utility function that assigns positive utility to whatever action the system takes at each time step and negative utility to every other action.
“Nontrivial”: likewise any system is behaviourally optimising for the compactly specifiable objective function that assigns equal utility to every state.
Motivation: if your definition of (behavioural) optimisation considers a rock an optimising system, then it’s useless/unhelpful.
Speculation: all (behavioural) optimisers are either mechanistic optimisers (including partial optimisers) or otherwise products of an optimisation process (e.g. tabular Q-learning).
- ^
Or approximate, though I’m not sure whether I prefer to consider evaluating approximations of a particular objective function as mechanistically optimising for the approximator instead of the “true” function.Suggestions welcome!Model free RL algorithms that determine their policy by argmaxing over actions wrt their value function seem best considered as mechanistically optimising their internal value function and not the return (discounted cumulative future reward) even though the learned value function is an approximation of the return.
Reasoning about a system as optimising for the return seems liable to lead to wrong inferences about its out of distribution/generalisation behaviour (e.g. hypothesising that it will wirehead because wireheading attains maximal return in the new environment).
- ^
Suggestions for alternative names/terminology welcome.
- ^
In an RL setting, the system’s policy is a mapping from agent state to (a probability distribution over) actions. As far as I’m aware, any learning task can be recast in an RL setting.
But you could consider a generalised policy a mapping from “inputs” (e.g. “the prompt” for an autoregressive language model) to a probability distribution over outputs.
- ^
Especially re: modelling long term consequences of trajectories and selecting performant heuristics for the performance on the selection metric in a given environment.
- ^
This is less true of humans given that we’ve moved out of our environment of evolutionary adaptedness.
But in a sense, the cultural/memetic selection happening on our species/civilisation is still itself an outer selection process.
Regardless, I’m not convinced the statement is sufficiently true of humans/our civilisation to try and generalise it to arbitrary intelligent systems.
I think it’s an interesting framing of the systems that exist today, although I expect the key crux between your views and the more traditional MIRI views is that MIRI-folks seem to expect recursive self-improvement at some point when the system suddenly replaces its learned heuristics with the ability to optimise in real-time.
Update: After discussing with Dragongod, I agree his point that explicit thought is slow and so powerful AI’s would likely want to use heuristics for most decisions, but not novel situations where explicit thought would work best. I would also emphasise the importance of explicit thought for updating heuristics.
I’m suspicious of your premise that evolution or anything is doing true global optimization. If the frame is the whole universe, all optimization is local optimization because of things like the speed of light limiting how fast information can propagate. Even if you restrict yourself to a Hubble volume this would still be the case. In essence, I’d argue all optimization is local optimization.
The “global” here means that all actions/outputs are optimising towards the same fixed goal(s):
This doesn’t seem especially “global” to me then. Maybe another term would be better? Maybe this is a proximate/ultimate distinction?
Currently using “task specific”/”total”.
Hmm, the etymology was that I was using “local optimisation” to refer to the kind of task specific optimisation humans do.
And global was the natural term to refer to the kind of optimisation I was claiming humans don’t do but which an expected utility maximiser does.
In the context of optimization, the meaning of “local” vs “global” is very well established; local means taking steps in the right direction based on a neighborhood, like hillclimbing, while global means trying to find the actual optimal point.
Yeah, I’m aware.
I would edit the post once I have better naming/terminology for the distinction I was trying to draw.
It happened as something like “humans optimise for local objectives/specific tasks” which eventually collapsed to “local optimisation”.
[Do please subject better adjectives!]
I think “work done” is the wrong metric for many practical purposes (rewards wastefulness) and one should instead focus on “optimization achieved” or something.
I could change that. I was thinking of work done in terms of bits of selection.
Though I don’t think that statement is true of humans unless you also include cultural memetic evolution (which I think you should).
I might be wrong but I think evolution only does a smallish number of bits worth of selection per generation? Whereas I think I could easily do orders of magnitude more in a day.
I’m not sure exactly what the claim is, but some things you might mean would be “generally correct, but far from necessary”. I’m not totally clear what “bits” means here, but I think there can be very strong selection pressure fast. The question is how disparate can be reproductive success. E.g. if only 1/2^n organisms reach sexual maturity, that’s n bits / generation right there. If you include gametes being selected (e.g. which sperm swim the fastest), that could be a bunch more bits. Then there’s how many offspring an organism has. If every man is either a loser or Genghis Khan, that’s another 10-ish bits. In artificial breeding, there can be even more bits / generation.
I think the child mortality rate used to be something like 50%, which is 1 bit/generation.
I think an ejaculation involves 100000000 sperm, which corresponds to 26.5 bits, assuming the very “best” sperm is selected (which itself seems unlikely, I feel like surely there’s a ton of noise).
I would be curious to know how you calculated this, but sure.
This gives us 37.5 bits so far. Presumably there’s also a bunch of other bits due to e.g. not all pregnancies making it, etc.. Let’s be conservative and say that we’ve only got half the bits in this count, so the total would be 75 bits.
That’s… not very much? Like I think I can easily write 1000 lines of code in a day, where each LOC would probably contain more than 75 bits worth of information. So I could easily 1000x exceed the selection power of evolution, in a single day worth of programming.
Somewhat more, but not hugely more I think? And regardless, humanity doesn’t originate from artificial breeding.
I just meant that if Genghis has 2^10 offspring, and likewise 1/2^10 men, while other men have 0 offspring, that’s 9 or 10 bits. Of course that’s not what actually happens, but what happens is some intermediate version that’s like 1-3 bits or something.
I think it’s surely less than that. I think when people say evolution is slow, they mean something much stronger, like that evolution is only 1-5 bits / generation.
I think you might also be discounting what’s being selected on. You wrote:
You can do orders of magnitude more opimization power to something on some criterion. But evolution’s evaluation function is much higher quality than yours. It evaluates the success of a complex organism in a complex environment, which is very complex to evaluate and is relevant to deep things (such as discovering intelligence). In a day, you are not able to do 75 bits of selection on cognitive architectures being good for producing intelligence.
I agree that this is an important distinction and didn’t mean to imply that my selection is on criteria that are as difficult as evolution’s.
Yeah for humans in particular, I think the statement is not true of solely biological evolution.
But also, I’m not sure you’re looking at it on the right level. Any animal presumably doesvmany bits worth of selection in a given day, but the durable/macroscale effects are better explained by evolutionary forces acting on the population than actions of different animals within their lifetimes.
Or maybe this is just a confused way to think/talk about it.
Can you list some examples of durable/macroscale effects you have in mind?
I’m gestating on this post. I suggest part of my original framing was confused, and so I’ll just let the ideas ferment some more.
Took this to drafts for a few days with the intention of refining it and polishing the ontology behind the post.
I ended up not doing that as much, because the improvements I was making to the underlying ontology felt better presented as a standalone post, so I mostly factored them out of this one.
I’m not satisfied with this post as is, but there’s some kernel of insight here that I think is valuable, and I’d want to be able to refer to the basic thrust of this post/some arguments made in it elsewhere.
I may make further edits to it in future.