Well sure, you can model anything as a utility maximiser technically, but the resource w.r.t which it’s being optimal/the way its preferences are carving up state-space will be incredibly awkward/garbled/unnatural (in the extreme, they could just be utility-maximizing over entire universe-histories). But these are unnatural/trivial. If we add constraints over the kind of resources it’s caring about/kinds of outcomes it can have preferences over, we constrain the set of what can be a utility-maximiser a lot. And if we constrain it to smth like the set of resources that we think in terms of, the resulting set of possible utility-maximisers do look scary.
I would guess that response is memetically largely downstream of my own old take. It’s not wrong, and it’s pretty easy to argue that future systems will in fact behave efficiently with respect to the resources we care about: we’ll design/train the system to behave efficiently with respect to those resources precisely because we care about those resources and resource-usage is very legible/measurable. But over the past year or so I’ve moved away from that frame, and part of the point of this post is to emphasize the frame I usually use now instead.
In that new frame, here’s what I would say instead: “Well sure, you can model anything as a utility maximizer technically, but usually any utility function compatible with the system’s behavior is very myopic—it mostly just cares about some details of the world “close to” (in time/space) the system itself, and doesn’t involve much optimization pressure against most of the world. If a system is to apply much optimization pressure to parts of the world far away from itself—like e.g. make & execute long-term plans—then the system must be a(n approximate) utility maximizer in a much less trivial sense. It must behave like it’s maximizing a utility function specifically over stuff far away.”
(… actually that’s not a thing I’d say, because right from the start I would have said that I’m using utility maximization mainly because it makes it easy to illustrate various problems. Those problems usually remain even when we don’t assume utility maximization, they’re just a lot less legible without a mathematical framework. But, y’know, for purposes of this discussion...)
Also on the actual theorem you outline here—it looks right, but isn’t assuming utilities assigned to outcomes s.t. the agent is trying to maximise over them kind of begging most of the question that coherence theorems are after?
In my head, an important complement to this post is Utility Maximization = Description Length Minimization, which basically argues that “optimization” in the usual Flint/Yudkowsky sense is synonymous with optimizing some utility function over the part of the world being optimized. However, that post doesn’t involve an optimizer; it just talks about stuff “being optimized” in a way which may or may not involve a separate thing which “does the optimization”.
This post adds the optimizer to that picture. We start from utility maximization over some “far away” stuff, in order to express optimization occurring over that far away stuff. Then we can ask “but what’s being adjusted to do that optimization?”, i.e. in the problem maxxu(x) what’s x? And if x is the “policy” of some system, such that the whole setup is an MDP, then find that there’s a nontrivial sense in which the system can be or not be a (long-range) utility maximizer—i.e. an optimizer.
Thanks, I feel like I understand your perspective a bit better now.
Re: your “old” frame: I agree that the fact we’re training an AI to be useful from our perspective will certainly constrain its preferences a lot, such that it’ll look like it has preferences over resources we think in terms of/won’t just be representable as a maximally random utility function. I think there’s a huge step from that though to “it’s a optimizer with respect to those resources” i.e there are a lot of partial orderings you can put over states where it broadly has preference orderings we like w.r.t resources without looking like a maximizer over those resources, and I don’t think that’s necessarily scary. I think some of this disagreement may be downstream of how much you think a superintelligence will “iron out wrinkles” like preference gaps internally though which is another can of worms
Re: your new frame: I think I agree that looking like a long-term/distance planner is much scarier. Obviously implicitly assuming we’re restricting to some interesting set of resources, because otherwise we can reframe any myopic maximizer as long-term and vice-versa. But this is going round in circles a bit, typing this out I think the main crux here for me is what I said in the previous point in that I think there’s too much of a leap from “looks like it has preferences over this resource and long-term plans” vs. “is a hardcore optimizer of said resource”. Maybe this is just a separate issue though, not sure I have any local disagreements here
Re: your last pont, thanks—I don’t think I have a problem with this, I think I was just misunderstanding the intended scope of the post
Obviously implicitly assuming we’re restricting to some interesting set of resources, because otherwise we can reframe any myopic maximizer as long-term and vice-versa.
This part I think is false. The theorem in this post does not need any notion of resources, and neither does Utility Maximization = Description Length Minimization. We do need a notion of spacetime (in order to talk about stuff far away in space/time), but that’s a much weaker ontological assumption.
I think what I’m getting at is more general than specifically talking about resources, I’m more getting at the degree of freedom in the problem description that lets you frame anything as technically optimizing something at a distance i.e. in ‘Utility Maximization = Description Length Minimization’ you can take any system, find its long-term and long-distance effects on some other region of space-time, and find a coding-scheme where those particular states have the shortest descriptions. The description length of the universe will by construction get minimized. Obviously this just corresponds to one of those (to us) very unnatural-looking “utility functions” over universe-histories or w/e
If we’re first fixing the coding scheme then this seems to me to be equivalent to constraining the kinds of properties we’re allowing as viable targets of optimization
I guess one way of looking at it is I don’t think it makes sense to talk about a system as being an optimizer/not an optimizer intrinsically. It’s a property of a system relative to a coding scheme/set of interesting properties/resources, everything is an optimizer relative to some encoding scheme. And all of the actual, empirical scariness of AI comes from how close the encoding scheme that by-definition makes it an optimizer is to our native encoding scheme—as you point out they’ll probably have some overlap but I don’t think that itself is scary
All possible encoding schemes / universal priors differ from each other by at most a finite prefix. You might think this doesn’t achieve much, since the length of the prefix can be in principle unbounded; but in practice, the length of the prefix (or rather, the prior itself) is constrained by a system’s physical implementation. There are some encoding schemes which neither you nor any other physical entity will ever be able to implement, and so for the purposes of description length minimization these are off the table. And of the encoding schemes that remain on the table, virtually all of them will behave identically with respect to the description lengths they assign to “natural” versus “unnatural” optimization criteria.
I would guess that response is memetically largely downstream of my own old take. It’s not wrong, and it’s pretty easy to argue that future systems will in fact behave efficiently with respect to the resources we care about: we’ll design/train the system to behave efficiently with respect to those resources precisely because we care about those resources and resource-usage is very legible/measurable. But over the past year or so I’ve moved away from that frame, and part of the point of this post is to emphasize the frame I usually use now instead.
In that new frame, here’s what I would say instead: “Well sure, you can model anything as a utility maximizer technically, but usually any utility function compatible with the system’s behavior is very myopic—it mostly just cares about some details of the world “close to” (in time/space) the system itself, and doesn’t involve much optimization pressure against most of the world. If a system is to apply much optimization pressure to parts of the world far away from itself—like e.g. make & execute long-term plans—then the system must be a(n approximate) utility maximizer in a much less trivial sense. It must behave like it’s maximizing a utility function specifically over stuff far away.”
(… actually that’s not a thing I’d say, because right from the start I would have said that I’m using utility maximization mainly because it makes it easy to illustrate various problems. Those problems usually remain even when we don’t assume utility maximization, they’re just a lot less legible without a mathematical framework. But, y’know, for purposes of this discussion...)
In my head, an important complement to this post is Utility Maximization = Description Length Minimization, which basically argues that “optimization” in the usual Flint/Yudkowsky sense is synonymous with optimizing some utility function over the part of the world being optimized. However, that post doesn’t involve an optimizer; it just talks about stuff “being optimized” in a way which may or may not involve a separate thing which “does the optimization”.
This post adds the optimizer to that picture. We start from utility maximization over some “far away” stuff, in order to express optimization occurring over that far away stuff. Then we can ask “but what’s being adjusted to do that optimization?”, i.e. in the problem maxx u(x) what’s x? And if x is the “policy” of some system, such that the whole setup is an MDP, then find that there’s a nontrivial sense in which the system can be or not be a (long-range) utility maximizer—i.e. an optimizer.
Thanks, I feel like I understand your perspective a bit better now.
Re: your “old” frame: I agree that the fact we’re training an AI to be useful from our perspective will certainly constrain its preferences a lot, such that it’ll look like it has preferences over resources we think in terms of/won’t just be representable as a maximally random utility function. I think there’s a huge step from that though to “it’s a optimizer with respect to those resources” i.e there are a lot of partial orderings you can put over states where it broadly has preference orderings we like w.r.t resources without looking like a maximizer over those resources, and I don’t think that’s necessarily scary. I think some of this disagreement may be downstream of how much you think a superintelligence will “iron out wrinkles” like preference gaps internally though which is another can of worms
Re: your new frame: I think I agree that looking like a long-term/distance planner is much scarier. Obviously implicitly assuming we’re restricting to some interesting set of resources, because otherwise we can reframe any myopic maximizer as long-term and vice-versa. But this is going round in circles a bit, typing this out I think the main crux here for me is what I said in the previous point in that I think there’s too much of a leap from “looks like it has preferences over this resource and long-term plans” vs. “is a hardcore optimizer of said resource”. Maybe this is just a separate issue though, not sure I have any local disagreements here
Re: your last pont, thanks—I don’t think I have a problem with this, I think I was just misunderstanding the intended scope of the post
This part I think is false. The theorem in this post does not need any notion of resources, and neither does Utility Maximization = Description Length Minimization. We do need a notion of spacetime (in order to talk about stuff far away in space/time), but that’s a much weaker ontological assumption.
I think what I’m getting at is more general than specifically talking about resources, I’m more getting at the degree of freedom in the problem description that lets you frame anything as technically optimizing something at a distance i.e. in ‘Utility Maximization = Description Length Minimization’ you can take any system, find its long-term and long-distance effects on some other region of space-time, and find a coding-scheme where those particular states have the shortest descriptions. The description length of the universe will by construction get minimized. Obviously this just corresponds to one of those (to us) very unnatural-looking “utility functions” over universe-histories or w/e
If we’re first fixing the coding scheme then this seems to me to be equivalent to constraining the kinds of properties we’re allowing as viable targets of optimization
I guess one way of looking at it is I don’t think it makes sense to talk about a system as being an optimizer/not an optimizer intrinsically. It’s a property of a system relative to a coding scheme/set of interesting properties/resources, everything is an optimizer relative to some encoding scheme. And all of the actual, empirical scariness of AI comes from how close the encoding scheme that by-definition makes it an optimizer is to our native encoding scheme—as you point out they’ll probably have some overlap but I don’t think that itself is scary
All possible encoding schemes / universal priors differ from each other by at most a finite prefix. You might think this doesn’t achieve much, since the length of the prefix can be in principle unbounded; but in practice, the length of the prefix (or rather, the prior itself) is constrained by a system’s physical implementation. There are some encoding schemes which neither you nor any other physical entity will ever be able to implement, and so for the purposes of description length minimization these are off the table. And of the encoding schemes that remain on the table, virtually all of them will behave identically with respect to the description lengths they assign to “natural” versus “unnatural” optimization criteria.