Evaluating argmax is very computationally expensive, so most agents most of the time will not be directly optimising over their actions but instead executing learned heuristics that historically correlated with better performance according to the metric the agent is selected for (e.g. reward).
That is, even if an agent somehow fully internalised the selection metric, directly optimising it over all its actions is just computationally intractable in “rich” (complex/high dimensional problem domains, continuous, partially observable/imperfect information, stochastic, large state/action spaces, etc.) environments. So a system inner aligned to the selection metric would still perform most of its cognition in a mostly amortised manner, provided the system is subject to bounded compute constraints.
Furthermore, in the real world learning agents don’t generally become inner aligned to the selection metric, but instead learn cognitive heuristics that historically correlated with performance on the selection metric.
In some sense, the true metric is under specified.
Note however, that this underspecification is more pernicious than just not distinguishing between the invariants of selection. The inductive biases of the selection process also matter. Proxies that are only correlated with the selection metric (“imperfect proxies”) may be internalised instead of the selection metric if they are more accessible/reachable/learnable by the intelligent system than the actual selection metric.
Analogy: humans couldn’t have internalised evolution’s selection metric of inclusive genetic fitness because humans had no concept of inclusive genetic fitness.]
So there are at least two dimensions on which real world intelligent systems diverge from the traditional idealisations of an agent:
Note that I said “most”; humans are capable of performing direct optimisation (e.g. “planning”) when needed, but such explicit reasoning is a minority of our cognition
I think this establishes some sort of baseline for what real world intelligent systems are like. However, I do not know what such systems “converge” to as they are scaled up (in training/inference compute/data or model parameters).
I am not very sure how online learning affects this either.
TurnTrout talks about reinforcement learning in the linked post, but I think the argument generalises very straightforwardly to any selection process and the metric of selection.
Confusions About Optimisation and Agency
Something I’m still not clear how to think about is effective agents in the real world.
I think viewing idealised agency as an actor that evaluates argmax wrt (the expected value of) a simple utility function over agent states is just wrong.
Evaluating argmax is very computationally expensive, so most agents most of the time will not be directly optimising over their actions but instead executing learned heuristics that historically correlated with better performance according to the metric the agent is selected for (e.g. reward).
That is, even if an agent somehow fully internalised the selection metric, directly optimising it over all its actions is just computationally intractable in “rich” (complex/high dimensional problem domains, continuous, partially observable/imperfect information, stochastic, large state/action spaces, etc.) environments. So a system inner aligned to the selection metric would still perform most of its cognition in a mostly amortised manner, provided the system is subject to bounded compute constraints.
Furthermore, in the real world learning agents don’t generally become inner aligned to the selection metric, but instead learn cognitive heuristics that historically correlated with performance on the selection metric.
[Is this because direct optimisation for the outer selection metric is too computationally expensive? I don’t think so. The sense I get is that selection just doesn’t work that way. The selection process can’t internalise the selection metric, because selection for a given metric produces equally strong selection for all the metric’s necessary and sufficient conditions[1] (the “invariants of selection”):
In some sense, the true metric is under specified.
Note however, that this underspecification is more pernicious than just not distinguishing between the invariants of selection. The inductive biases of the selection process also matter. Proxies that are only correlated with the selection metric (“imperfect proxies”) may be internalised instead of the selection metric if they are more accessible/reachable/learnable by the intelligent system than the actual selection metric.
Analogy: humans couldn’t have internalised evolution’s selection metric of inclusive genetic fitness because humans had no concept of inclusive genetic fitness.]
So there are at least two dimensions on which real world intelligent systems diverge from the traditional idealisations of an agent:
Real world systems do not perform most of their cognition by directly optimising an appropriate objective function, but by executing learned cognitive adaptations
Note that I said “most”; humans are capable of performing direct optimisation (e.g. “planning”) when needed, but such explicit reasoning is a minority of our cognition
Real world systems don’t internalise the metric on which they were selected for, but instead learn various contextual heuristics that correlated with high performance on that metric.
I see this as the core claim of shard theory, and the cause of “complexity of value”
I think this establishes some sort of baseline for what real world intelligent systems are like. However, I do not know what such systems “converge” to as they are scaled up (in training/inference compute/data or model parameters).
I am not very sure how online learning affects this either.
I am sceptical that it converges towards anything like “embedded AIXI”. I just do not think embedded AIXI represents any sort of limit or idealisation of intelligent systems in the real world.
Alas, I have no better ideas. Speculation on this is welcome.
Cc: @Quintin Pope, @cfoster0, @beren, @TurnTrout
TurnTrout talks about reinforcement learning in the linked post, but I think the argument generalises very straightforwardly to any selection process and the metric of selection.