(I promise I’m not being intentionally contrarian.)
I agree with everything after this paragraph.
It can be more interesting when “outcomes” refers to world states instead (that is, snapshots of what the world looks like at a particular time), but utility functions over states/snapshots can’t capture everything we’re interested in, and there’s no reason to take as an assumption that an AI system will have a utility function over states/snapshots.
In particular, I don’t doubt that, if we allow utility functions over histories, then saying that something is an expected utility maximizer doesn’t tell us much about its behavior.
However, I intuitively think that we should expect AI to have a utility function over world states,[1] and everything I’ve read (that I recall) suggests that this is what people are talking about. The Coherent decisions imply consistent utilities Arbital page is all about world states.
In fact, if we’re talking about histories, then all of the examples of circular utilities stop being examples of circular utilities. An agent that keeps paying you to go around in circles via A→B→C→A⋯ can be perfectly rational since it corresponds to the utility function for which U(B,t)>U(A,t),U(C,t) for t≡1 mod 3 but U(C,t)>U(A,t),U(B,t) for t≡2 mod 3 and so on.
So it feels like you’re arguing against something that was never intended. Clearly, EY wasn’t thinking about utility functions that are allowed to depend on arbitrary histories when he wrote the Arbital post (or during his “AI alignment, why it’s hard & where to start” talk, which makes the same points).[2]
I may be making a stupid point here since I haven’t read the VNM theorem. The problem is that I don’t know what theorem exactly you’re talking about, and neither you nor Eliezer linked to it, and I got the impression that there are actually a bunch of theorems saying similar things, and it’s kind of a mess. I don’t understand why you can’t just look at the theorem and see whether it talks about world states or histories, but I guess the formalism is too abstract or something?
I’m also surprised that no-one else has made a similar point before. Has Eliezer ever responded to this post?
Or rather, a utility function that is allowed to depend on time but has to do so linearly. I can imagine it caring about history, but not that it prefers [t time in state A over t time in state B] as well as [r⋅t time in state B over r⋅t time in state A for any r∈R]. It seems like utility functions that are restricted in this way would capture everything we care about, and I assume that they would rescue the argument. Certainly, the construction with U(a,h)∈{0,1} to justify arbitrary behavior doesn’t work anymore.
I guess one way of framing this would be that you’re debunking what was said, and if EY meant to restrict utility functions in this way, it would have been on him to say so explicitly. Which is fine, but then I’m still wondering whether [coherence arguments phrased differently to restrict how utility functions can depend on histories] imply goal-directed behavior.
However, I intuitively think that we should expect AI to have a utility function over world states
My main point is that it relies on some sort of intuition like this rather than being determined by math. As an aside, I doubt “world states” is enough to rescue the argument, unless you have very coarse world states that only look at the features that humans care about.
In fact, if we’re talking about histories, then all of the examples of circular utilities stop being examples of circular utilities.
Yup, exactly.
I don’t understand why you can’t just look at the theorem and see whether it talks about world states or histories, but I guess the formalism is too abstract or something?
The theorem can apply to world states or histories. The VNM theorem assumes that there is some set of “outcomes” that the agent has preferences over; you can use either world states or histories for that set of outcomes. Using only world states would be a stronger assumption.
So it feels like you’re arguing against something that was never intended.
Yup, that’s right. I am merely pointing out that the intended argument depends on intuition, and is not a straightforward consequence of math / the VNM theorem.
Clearly, EY wasn’t thinking about utility functions that are allowed to depend on arbitrary histories when he wrote the Arbital post (or during his “AI alignment, why it’s hard & where to start” talk, which makes the same points).
Sure, but there’s this implication that “since this is a theory of rationality, any intelligent AI system will be well modeled like this”, without acknowledging that this depends on the assumption that the relevant outcome set is that of (coarsely modeled) world states (or some other assumption). That’s the issue I want to correct.
I’m also surprised that no-one else has made a similar point before. Has Eliezer ever responded to this post?
I see. That all makes sense. Kind of interesting; there’s not really much we appear to disagree on, I just took your post as making a stronger claim than you’re really making.
(I promise I’m not being intentionally contrarian.)
I agree with everything after this paragraph.
In particular, I don’t doubt that, if we allow utility functions over histories, then saying that something is an expected utility maximizer doesn’t tell us much about its behavior.
However, I intuitively think that we should expect AI to have a utility function over world states,[1] and everything I’ve read (that I recall) suggests that this is what people are talking about. The Coherent decisions imply consistent utilities Arbital page is all about world states.
In fact, if we’re talking about histories, then all of the examples of circular utilities stop being examples of circular utilities. An agent that keeps paying you to go around in circles via A→B→C→A⋯ can be perfectly rational since it corresponds to the utility function for which U(B,t)>U(A,t),U(C,t) for t≡1 mod 3 but U(C,t)>U(A,t),U(B,t) for t≡2 mod 3 and so on.
So it feels like you’re arguing against something that was never intended. Clearly, EY wasn’t thinking about utility functions that are allowed to depend on arbitrary histories when he wrote the Arbital post (or during his “AI alignment, why it’s hard & where to start” talk, which makes the same points).[2]
I may be making a stupid point here since I haven’t read the VNM theorem. The problem is that I don’t know what theorem exactly you’re talking about, and neither you nor Eliezer linked to it, and I got the impression that there are actually a bunch of theorems saying similar things, and it’s kind of a mess. I don’t understand why you can’t just look at the theorem and see whether it talks about world states or histories, but I guess the formalism is too abstract or something?
I’m also surprised that no-one else has made a similar point before. Has Eliezer ever responded to this post?
Or rather, a utility function that is allowed to depend on time but has to do so linearly. I can imagine it caring about history, but not that it prefers [t time in state A over t time in state B] as well as [r⋅t time in state B over r⋅t time in state A for any r∈R]. It seems like utility functions that are restricted in this way would capture everything we care about, and I assume that they would rescue the argument. Certainly, the construction with U(a,h)∈{0,1} to justify arbitrary behavior doesn’t work anymore.
I guess one way of framing this would be that you’re debunking what was said, and if EY meant to restrict utility functions in this way, it would have been on him to say so explicitly. Which is fine, but then I’m still wondering whether [coherence arguments phrased differently to restrict how utility functions can depend on histories] imply goal-directed behavior.
Richard Ngo did consider this line of argument, see Coherent behaviour in the real world is an incoherent concept.
My main point is that it relies on some sort of intuition like this rather than being determined by math. As an aside, I doubt “world states” is enough to rescue the argument, unless you have very coarse world states that only look at the features that humans care about.
Yup, exactly.
The theorem can apply to world states or histories. The VNM theorem assumes that there is some set of “outcomes” that the agent has preferences over; you can use either world states or histories for that set of outcomes. Using only world states would be a stronger assumption.
Yup, that’s right. I am merely pointing out that the intended argument depends on intuition, and is not a straightforward consequence of math / the VNM theorem.
Sure, but there’s this implication that “since this is a theory of rationality, any intelligent AI system will be well modeled like this”, without acknowledging that this depends on the assumption that the relevant outcome set is that of (coarsely modeled) world states (or some other assumption). That’s the issue I want to correct.
Not that I know of.
I see. That all makes sense. Kind of interesting; there’s not really much we appear to disagree on, I just took your post as making a stronger claim than you’re really making.
Thanks—that gets at exactly what I was talking about. If I have more to say, I’ll do it under that post.