However, I intuitively think that we should expect AI to have a utility function over world states
My main point is that it relies on some sort of intuition like this rather than being determined by math. As an aside, I doubt “world states” is enough to rescue the argument, unless you have very coarse world states that only look at the features that humans care about.
In fact, if we’re talking about histories, then all of the examples of circular utilities stop being examples of circular utilities.
Yup, exactly.
I don’t understand why you can’t just look at the theorem and see whether it talks about world states or histories, but I guess the formalism is too abstract or something?
The theorem can apply to world states or histories. The VNM theorem assumes that there is some set of “outcomes” that the agent has preferences over; you can use either world states or histories for that set of outcomes. Using only world states would be a stronger assumption.
So it feels like you’re arguing against something that was never intended.
Yup, that’s right. I am merely pointing out that the intended argument depends on intuition, and is not a straightforward consequence of math / the VNM theorem.
Clearly, EY wasn’t thinking about utility functions that are allowed to depend on arbitrary histories when he wrote the Arbital post (or during his “AI alignment, why it’s hard & where to start” talk, which makes the same points).
Sure, but there’s this implication that “since this is a theory of rationality, any intelligent AI system will be well modeled like this”, without acknowledging that this depends on the assumption that the relevant outcome set is that of (coarsely modeled) world states (or some other assumption). That’s the issue I want to correct.
I’m also surprised that no-one else has made a similar point before. Has Eliezer ever responded to this post?
I see. That all makes sense. Kind of interesting; there’s not really much we appear to disagree on, I just took your post as making a stronger claim than you’re really making.
Richard Ngo did consider this line of argument, see Coherent behaviour in the real world is an incoherent concept.
My main point is that it relies on some sort of intuition like this rather than being determined by math. As an aside, I doubt “world states” is enough to rescue the argument, unless you have very coarse world states that only look at the features that humans care about.
Yup, exactly.
The theorem can apply to world states or histories. The VNM theorem assumes that there is some set of “outcomes” that the agent has preferences over; you can use either world states or histories for that set of outcomes. Using only world states would be a stronger assumption.
Yup, that’s right. I am merely pointing out that the intended argument depends on intuition, and is not a straightforward consequence of math / the VNM theorem.
Sure, but there’s this implication that “since this is a theory of rationality, any intelligent AI system will be well modeled like this”, without acknowledging that this depends on the assumption that the relevant outcome set is that of (coarsely modeled) world states (or some other assumption). That’s the issue I want to correct.
Not that I know of.
I see. That all makes sense. Kind of interesting; there’s not really much we appear to disagree on, I just took your post as making a stronger claim than you’re really making.
Thanks—that gets at exactly what I was talking about. If I have more to say, I’ll do it under that post.