If you’re not vNM-coherent you will get Dutch-booked if there are Dutch-bookers around.
This especially applies to multipolar scenarios with AI systems in competition.
I have an intuition that this also applies in degrees: if you are more vNM-coherent than I am (which I think I can define), then I’d guess that you can Dutch-book me pretty easily.
My contention is that I don’t think the preconditions hold.
Agents don’t fail to be VNM coherent by having incoherent preferences given the axioms of VNM. They fail to be VNM coherent by violating the axioms themselves.
Completeness is wrong for humans, and with incomplete preferences you can be non exploitable even without admitting a single fixed utility function over world states.
I notice I am confused. How do you violate an axiom (completeness) without behaving in a way that violates completeness? I don’t think you need an internal representation.
Elaborating more, I am not sure how you even display a behavior that violates completeness. If you’re given a choice between only universe-histories a and b, and your preferences are imcomplete over them, what do you do? As soon as you reliably act to choose one over the other, for any such pair, you have algorithmically-revealed complete preferences.
If you don’t reliably choose one over the other, what do you do then?
Choose randomly? But then I’d guess you are again Dutch-bookable. And according to which distribution?
Your choice is undefined? That seems both kinda bad and also Dutch-bookable to me tbh. Alwo don’t see the difference between this and random choice (shodt of going up in flames, which would constigute a third, hitherto unassumed option).
Go away/refuse the trade &c? But this is denying the premise! You only have universe-histories a and b tp choose between! I think what happens with humans is that they are often incomplete over very low-ranking worlds and are instead searching for policies to find high-ranking worlds while not choosing. I think incomplwteness might be fine if there are two options you can guarantee to avoid, but with adversarial dynamics that becomes more and more difficult.
The interesting part is how systems/pre-agents/egregores/whatever become complete.
If it already satisfies the other VNM axioms we can analyse the situation as follows:
Recall that ain inexploitable but incomplete VNM agents acts like a Vetocracy of VNM agents. The exact decomposition is underspecified by just the preference order and is another piece of data (hidden state).
However, given sure-gain offers from the environment there is selection pressure for the internal complete VNM Subagents to make trade agreements to obtain a pareto improvement.
If you analyze this it looks like a simple prisoner dilemma type case which can be analyzed the usual way in game theory. For instance, in repeated offers with uncertain horizon the Subagents may be able to cooperate.
Once they are (approximately) complete they will be under selection pressure to satisfy the other axioms. You could say this the beginning of ‘emergence of expected utility maximizers’
As you can see the key here is that we really should be talking about Selection Theorems not the highly simplified Coherence Theorems. Coherence theorems are about ideal agents.
Selection theorems are about how more and more coherent and goal-directed agents may emerge.
It seems like your core claim is that we can reinterpret expected-utility maximizers as expected-number-of-bits-needed-to-describe-the-world-using-M2 minimizers, for some appropriately chosen model of the world M2.
If so, then it seems like something weird is happening, because typical utility functions (e.g. “pleasure—pain” or “paperclips”) are unbounded above and below, whereas bits are bounded below, meaning a bit-minimizer is like a utility function that’s bounded above: there’s a best possible state the world could be in according to that bit-minimizer.
Or are we using a version of expected utility theory that says utility must be bounded above and below? (In that case, I might still ask, isn’t that in conflict with how number-of-bits is unbounded above?)
The core conceptual argument is: the higher your utility function can go, the bigger the world must be, and so the more bits it must take to describe it in its unoptimized state under M2, and so the more room there is to reduce the number of bits.
If you could only ever build 10 paperclips, then maybe it takes 100 bits to specify the unoptimized world, and 1 bit to specify the optimized world.
If you could build 10^100 paperclips, then the world must be humongous and it takes 10^101 bits to specify the unoptimized world, but still just 1 bit to specify the perfectly optimized world.
If you could build ∞ paperclips, then the world must be infinite, and it takes ∞ bits to specify the unoptimized world. Infinities are technically challenging, and John’s comment goes into more detail about how you deal with this sort of case.
For more intuition, notice that exp(x) is a bijective function from (-∞, ∞) to (0, ∞), so it goes from something unbounded on both sides to something unbounded on one side. That’s exactly what’s happening here, where utility is unbounded on both sides and gets mapped to something that is unbounded only on one side.
I don’t know of any formal arguments that predict that all or most future AI systems are purely expected utility maximizers. I suspect most don’t believe that to be the case in any simple way.
I do know of a very powerful argument (a proof, in fact) that if an agent’s goal structure is complete, transitively consistent, continuous, and independent of irrelevant alternatives, then it will be consistent with an expected-utility-maximizing model. See https://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem
The open question remains, since humans do not meet these criteria, whether more powerful forms of intelligence are more likely to do so.
As I say, open question. We have only one example of a generally intelligent system, and that’s not even very intelligent. We have no clue how to extend or compare that to other types.
It does seem like VNM-rational agents will be better than non-rational agents at achieving their goals. It’s unclear if that’s a nudge to make agents move toward VNM-rationality as they get more capable, or a filter to advantage VNM-rational agents in competition to power. Or a non-causal observation, because goals are orthogonal to power.
What are the best arguments that expected utility maximisers are adequate (descriptive if not mechanistic) models of powerful AI systems?
[I want to address them in my piece arguing the contrary position.]
If you’re not vNM-coherent you will get Dutch-booked if there are Dutch-bookers around.
This especially applies to multipolar scenarios with AI systems in competition.
I have an intuition that this also applies in degrees: if you are more vNM-coherent than I am (which I think I can define), then I’d guess that you can Dutch-book me pretty easily.
My contention is that I don’t think the preconditions hold.
Agents don’t fail to be VNM coherent by having incoherent preferences given the axioms of VNM. They fail to be VNM coherent by violating the axioms themselves.
Completeness is wrong for humans, and with incomplete preferences you can be non exploitable even without admitting a single fixed utility function over world states.
I notice I am confused. How do you violate an axiom (completeness) without behaving in a way that violates completeness? I don’t think you need an internal representation.
Elaborating more, I am not sure how you even display a behavior that violates completeness. If you’re given a choice between only universe-histories a and b, and your preferences are imcomplete over them, what do you do? As soon as you reliably act to choose one over the other, for any such pair, you have algorithmically-revealed complete preferences.
If you don’t reliably choose one over the other, what do you do then?
Choose randomly? But then I’d guess you are again Dutch-bookable. And according to which distribution?
Your choice is undefined? That seems both kinda bad and also Dutch-bookable to me tbh. Alwo don’t see the difference between this and random choice (shodt of going up in flames, which would constigute a third, hitherto unassumed option).
Go away/refuse the trade &c? But this is denying the premise! You only have universe-histories a and b tp choose between! I think what happens with humans is that they are often incomplete over very low-ranking worlds and are instead searching for policies to find high-ranking worlds while not choosing. I think incomplwteness might be fine if there are two options you can guarantee to avoid, but with adversarial dynamics that becomes more and more difficult.
If you define your utility function over histories, then every behaviour is maximising an expected utility function no?
Even behaviour that is money pumped?
I mean you can’t money pump any preference over histories anyway without time travel.
The Dutchbook arguments apply when your utility function is defined over your current state with respect to some resource?
I feel like once you define utility function over histories, you lose the force of the coherence arguments?
What would it look like to not behave as if maximising an expected utility function for a utility function defined over histories.
Agree.
There are three stages:
Selection for inexploitability
The interesting part is how systems/pre-agents/egregores/whatever become complete.
If it already satisfies the other VNM axioms we can analyse the situation as follows: Recall that ain inexploitable but incomplete VNM agents acts like a Vetocracy of VNM agents. The exact decomposition is underspecified by just the preference order and is another piece of data (hidden state). However, given sure-gain offers from the environment there is selection pressure for the internal complete VNM Subagents to make trade agreements to obtain a pareto improvement. If you analyze this it looks like a simple prisoner dilemma type case which can be analyzed the usual way in game theory. For instance, in repeated offers with uncertain horizon the Subagents may be able to cooperate.
Once they are (approximately) complete they will be under selection pressure to satisfy the other axioms. You could say this the beginning of ‘emergence of expected utility maximizers’
As you can see the key here is that we really should be talking about Selection Theorems not the highly simplified Coherence Theorems. Coherence theorems are about ideal agents. Selection theorems are about how more and more coherent and goal-directed agents may emerge.
The boring technical answer is that any policy can be described as a utility maximiser given a contrived enough utility function.
The counter argument to that if the utility function is as complicated as the policy, then this is not a useful description.
I like Utility Maximization = Description Length Minimization.
Make sure you also read the comments underneath, there are a few good discussions going on, clearing up various confusions, like this one:
I don’t know of any formal arguments that predict that all or most future AI systems are purely expected utility maximizers. I suspect most don’t believe that to be the case in any simple way.
I do know of a very powerful argument (a proof, in fact) that if an agent’s goal structure is complete, transitively consistent, continuous, and independent of irrelevant alternatives, then it will be consistent with an expected-utility-maximizing model. See https://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem
The open question remains, since humans do not meet these criteria, whether more powerful forms of intelligence are more likely to do so.
Yeah, I think the preconditions of VNM straightforwardly just don’t apply to generally intelligent systems.
As I say, open question. We have only one example of a generally intelligent system, and that’s not even very intelligent. We have no clue how to extend or compare that to other types.
It does seem like VNM-rational agents will be better than non-rational agents at achieving their goals. It’s unclear if that’s a nudge to make agents move toward VNM-rationality as they get more capable, or a filter to advantage VNM-rational agents in competition to power. Or a non-causal observation, because goals are orthogonal to power.