Epistemic status: mostly speculation and simplification, but I stand by the rough outline of ‘self-unaware learners → self-aware consequentialists struggling with multipolarity → static rule-following not-thinking-too hard non-learners’. The two most important transitions are “learning” and then, once you’ve learned enough, “committing/self-modifying (away from learning)”.
Setup
I briefly sketch three phases I guess that ‘agents’ go through, and consider how two different metrics change during this progression. This is a highly speculative just-so story that currently intuitively sounds correct to me, though I’m not very confident in very much of what I’ve written and leaned too much into the ‘fun’ heuristic at times.
The transition from the first stage to the second stage is learning to become more consequentialist. The transition from the second stage to the third is self-modifying away from consequentialism.
In each of three stages I consider the predictability of both (a) the agent’s decisions and (b) the agent’s environment when one has either (I) full empirical facts about the agent and environment or (II) partial empirical facts. I don’t think these two properties to track are the most important or relevant, but helped to guide my intuitions in writing this life-cycle.
Phase 1: the transition from self-unaware and dumb to self-aware and smart
Agents in this stage are characterised by learning, but not yet self-modifying—they have not learned enough to do this yet! They have started in motion (possibly by selection pressure), and are on the right track towards becoming more consequentialist / VNM rational / maximise-y.
They’re generally relatively self-centred and don’t model other agents in much detail if at all. They begin to have some self-awareness. There’s not too much sense that they consider different actions: the process to decide between actions is relatively ‘unconscious’ and the ability to consider the value of modifying oneself is beyond the agent for a while. They stumble into the next stage by gaining this ability.
These agents are updating on everything and thus ‘winning’ more in their world. The ability to move into stage two requires some minimum amount of ‘winning’ (due to selection pressures).
Agent’s decisions
Agent’s environment
Full empirical facts
High. Computationally the agent is not doing anything advanced and so one can easily simulate them.
Low-medium. The environment is relatively unaffected by the agent since they are not very good at achieving their goals. One might expect to see some change towards the satisfaction of their preferences. This is more true in ‘easy mode’ i.e. worlds where there is little to no competition.
Partial empirical facts
Medium. The agent’s behaviour, since it is poorly optimised, could fit any number of internal states.
Further, there may be significant randomness involved in the decision making. This could be deliberate e.g. for exploration. This could also be because of low error correction in their decision-making module and physical features of the world can influence their decisions.
Slightly lower than above. Their goals and preferences are not necessarily obvious from their environment.
(Again, the less competition in the environment, or the easier it is for them to achieve their goal, or the more crude their goal is, make this ability to predict easier).
Phase 2: self-aware, maximise-y and beginning to model other agents
Agents in this stage are consequentialists. Between stages one and two, they now reason about their own decision process and are able to consider actions that modify their action-choosing process. They also remain updateful and have the capacity to reason about other agents (not limited to their future selves, who may be very different). These three features make stage two agents unstable: they quickly self-modify away.
At the end of this stage, the agents are thinking in great detail about other agents. They can ‘win’ in some interactions by outthinking other agents. The interactions are not necessarily restricted to nearby agents. The acausal landscape is massively multipolar and the stakes (depending on preferences) may be much higher than in the local spacetime environment.
Agent’s decisions
Agent’s environment
Full empirical facts
Low. Agents are doing many logical steps to work out what other agents are thinking.
High. They are beginning to build computronium and converge on optimal designs for environment (e.g. Dyston-sphere like technology)
Partial empirical facts
Low-medium. The environment still gives lots of clues about the agent’s preferences and beliefs and the agent is following a relatively simple to write down algorithm.
Further, the agent is already optimising for error correcting and preserving its existing values and improving cognitive abilities, and so their mind is relatively orderly.
However, the process by which they move from stage 2 to 3 (which is what happens straight upon coming to stage 2) may be highly noisy. This commitment race may be a function of the agent’s prior beliefs about facts they have little evidence for, and this prior may be relatively arbitrary.
High. The exact contents of some of the the computronium may be hard to predict (which is pretty much predicting their decision) but some will be easy (e.g. their utiltronium).
Phase three: galaxed brained, set in their ways and ‘at one’ with many other agents
Agents in this stage are in it for the long haul (trillions of years). Between phases 2 and 3 the agent makes irreversible commitments, making themselves more predictable to other agents and settling into game-theoretic equilibria. Phase 3 agents act in ways very correlated with other agents (potentially in a coalition of many agents all running the same algorithm).
Phase 3 agents have maxed out their lightcone with physical stuff and reached the end of their tech tree. They have nothing left to learn and are most likely updateless (or similar e.g. a patchwork of many commitments constraining their actions). There’s not much thinking for the agent left to do; everything was decided a long time ago (though maybe this thinking—the transition from phase 2 to 3 - took a while). The agent mostly sticks around just to maintain their optimised utility (potentially using something like a compromise utility function following acausal trade).
The universe expands into many causally disconnected regions and the agent is ‘split’ into multiple copies. Whether these are still meaningfully agents is not clear: I would guess they are well imagined as a non-human animal but with overpowered instincts and abilities to protect themselves and their stuff—like a sleeping dragon guarding its gold.
Agent’s decisions
Agent’s environment
Full empirical facts
High. There are not many decisions left to make. They are pretty much lobotomised versions of their “must think about the consequences of everything”-former selves. They follow simple rules and live in a relatively static world.
High. Massive stability (after all the stars rearranged into the most efficient arrangement). The world is relatively static.
Partial empirical facts
High. They have very robust error correcting mechanisms, and also mechanisms to prevent the emergence of any consequentialist (sub-)agents with any (bargaining) power within their causal control.
High. There’s a lot of redundancy in the environment in order to figure out what’s going on. Not much changes.
The lifecycle of ‘agents’
Epistemic status: mostly speculation and simplification, but I stand by the rough outline of ‘self-unaware learners → self-aware consequentialists struggling with multipolarity → static rule-following not-thinking-too hard non-learners’. The two most important transitions are “learning” and then, once you’ve learned enough, “committing/self-modifying (away from learning)”.
Setup
I briefly sketch three phases I guess that ‘agents’ go through, and consider how two different metrics change during this progression. This is a highly speculative just-so story that currently intuitively sounds correct to me, though I’m not very confident in very much of what I’ve written and leaned too much into the ‘fun’ heuristic at times.
The transition from the first stage to the second stage is learning to become more consequentialist. The transition from the second stage to the third is self-modifying away from consequentialism.
In each of three stages I consider the predictability of both (a) the agent’s decisions and (b) the agent’s environment when one has either (I) full empirical facts about the agent and environment or (II) partial empirical facts. I don’t think these two properties to track are the most important or relevant, but helped to guide my intuitions in writing this life-cycle.
Phase 1: the transition from self-unaware and dumb to self-aware and smart
Agents in this stage are characterised by learning, but not yet self-modifying—they have not learned enough to do this yet! They have started in motion (possibly by selection pressure), and are on the right track towards becoming more consequentialist / VNM rational / maximise-y.
They’re generally relatively self-centred and don’t model other agents in much detail if at all. They begin to have some self-awareness. There’s not too much sense that they consider different actions: the process to decide between actions is relatively ‘unconscious’ and the ability to consider the value of modifying oneself is beyond the agent for a while. They stumble into the next stage by gaining this ability.
These agents are updating on everything and thus ‘winning’ more in their world. The ability to move into stage two requires some minimum amount of ‘winning’ (due to selection pressures).
Medium. The agent’s behaviour, since it is poorly optimised, could fit any number of internal states.
Further, there may be significant randomness involved in the decision making. This could be deliberate e.g. for exploration. This could also be because of low error correction in their decision-making module and physical features of the world can influence their decisions.
Slightly lower than above. Their goals and preferences are not necessarily obvious from their environment.
(Again, the less competition in the environment, or the easier it is for them to achieve their goal, or the more crude their goal is, make this ability to predict easier).
Phase 2: self-aware, maximise-y and beginning to model other agents
Agents in this stage are consequentialists. Between stages one and two, they now reason about their own decision process and are able to consider actions that modify their action-choosing process. They also remain updateful and have the capacity to reason about other agents (not limited to their future selves, who may be very different). These three features make stage two agents unstable: they quickly self-modify away.
At the end of this stage, the agents are thinking in great detail about other agents. They can ‘win’ in some interactions by outthinking other agents. The interactions are not necessarily restricted to nearby agents. The acausal landscape is massively multipolar and the stakes (depending on preferences) may be much higher than in the local spacetime environment.
Low-medium. The environment still gives lots of clues about the agent’s preferences and beliefs and the agent is following a relatively simple to write down algorithm.
Further, the agent is already optimising for error correcting and preserving its existing values and improving cognitive abilities, and so their mind is relatively orderly.
However, the process by which they move from stage 2 to 3 (which is what happens straight upon coming to stage 2) may be highly noisy. This commitment race may be a function of the agent’s prior beliefs about facts they have little evidence for, and this prior may be relatively arbitrary.
Phase three: galaxed brained, set in their ways and ‘at one’ with many other agents
Agents in this stage are in it for the long haul (trillions of years). Between phases 2 and 3 the agent makes irreversible commitments, making themselves more predictable to other agents and settling into game-theoretic equilibria. Phase 3 agents act in ways very correlated with other agents (potentially in a coalition of many agents all running the same algorithm).
Phase 3 agents have maxed out their lightcone with physical stuff and reached the end of their tech tree. They have nothing left to learn and are most likely updateless (or similar e.g. a patchwork of many commitments constraining their actions). There’s not much thinking for the agent left to do; everything was decided a long time ago (though maybe this thinking—the transition from phase 2 to 3 - took a while). The agent mostly sticks around just to maintain their optimised utility (potentially using something like a compromise utility function following acausal trade).
The universe expands into many causally disconnected regions and the agent is ‘split’ into multiple copies. Whether these are still meaningfully agents is not clear: I would guess they are well imagined as a non-human animal but with overpowered instincts and abilities to protect themselves and their stuff—like a sleeping dragon guarding its gold.