This definition has the important feature of restricting “Friendly AI” to designs that have a utility function.
That doesn’t seem important—for the reason described here—where it says:
Utility maximisation is a general framework which is powerful enough to model the actions of any computable agent. The actions of any computable agent—including humans—can be expressed using a utility function.
The actions of any computable agent—including humans—can be expressed using a utility function.
This is a highly questionable statement concerning humans, and the paper linked from that page doesn’t appear to prove it.
Edit: ah, this includes “functions” that anyone else would call a “stupidly complicated state machine” and which may not actually be feasible to calculate.
Yes indeed, and the only way to fit that function to the human state machine is to include a “t” term, over the life of the human in question. Which is pretty much infeasible to calculate unless you invoke “and then a miracle occurs”.
Utility-based models are no more “infeasible to calculate” than any other model. Indeed you can convert any model of an agent into a utility-based model by an I/O-based “wrapper” of it—as described here. The idea that utility-based models of humans are more computationally intractible than other models is just wrong.
Indeed you can convert any model of an agent into a utility-based model by an I/O-based “wrapper” of it—as described here.
You keep repeating this Texas Sharpshooter Utility Function fallacy (earlier appearances in the link you gave, and here and here, of observing what the agent does, and retrospectively labelling that with utility 1 and everything else with utility 0. And as often as you do that, I will point out it’s a fallacy. Something that can only be computed after the action is known cannot be used before the fact to choose the action.
I was talking about wrapping a model of a human—thus converting a non-utility-based model into a utility-based one. That operation is, of course, not circular. If you think the argument is circular, you haven’t grasped the intended purpose of it.
It doesn’t give you a utility-based model. A model is a structure whose parts correspond to parts of the thing modelled, and which interact in the same way as in the thing modelled. This post-hoc utility function does not correspond to anything.
What next? Label with 1 everything that happens and 0 everything that doesn’t and call that a utliity-based model of the universe?
Here, I made it pretty clear from the beginning that I was starting with an existing model—and then modifying it. A model with a few bits strapped onto it is still a model.
Well, I would say “utilitarian”, but that word seems to be taken. I mean that the model calculates utilities associated with its possible actions—and then picks the action with the highest utility.
But that is exactly what this wrapping in a post-hoc utility function doesn’t do. The model first picks an action in whatever way it does, then labels that with utility 1.
The trouble, as usual, being that most of these descriptive utility functions are very complicated relative to the storage space we have available—they start out in the format of “one number for every possible history of the universe,” and don’t get compressed much from there.
That is not a problem. A compact utility-based description of an agent’s behaviour is only ever slightly longer than the shortest description of it available. It’s easy to show that by considering a utility-based “wrapper” around the shortest description.
That’s a good way to get effective expected utilities. But expected utilities aren’t utility functions. Hm, there may be a way to fix that that I haven’t noticed, though. But maybe not.
Your comment doesn’t seem very clear to me. Are you thinking that a “utility function” needs to have a specific domain which is not simply sensory contents and internal state? If so, do you have a reference for that notion?
I am at least claiming that in the context of designing a good AI, “utility function” should be taken to be a function of some external world, yes.
Otherwise you may run into problems. For example, you could offer to change a robot’s sensory contents and internal state to something with higher utility than its current state—and if the agent refuses, you will reset it. If we were using a “utility wrapper” model, all modeled agents would say yes. But the trivial example of an agent that always says “I would prefer not to” (BartlebeyBot) demonstrates that not all agents make choices that maximize some function of their internal state.
So: the only information available to any agent is in the form of its internal state and its sensory channels. Any function it computes must have that domain (or some subset of it). Confining the agent to that domain isn’t any kind of restriction. All utility functions calulated over the state of the world necessarily correspond to other utility functions calulated over the domain of internal state and sensory input.
Your example seems wrong to me. The problem is with:
For example, you could offer to change a robot’s sensory contents and internal state to something with higher utility than its current state—and if the agent refuses, you will reset it. If we were using a “utility wrapper” model, all modeled agents would say yes.
That’s not correct. For one thing, the agent may not believe what you say.
the only information available to any agent is in the form of its internal state and its sensory channels. Any function it computes must have that domain (or some subset of it).
Good point. So any function it computes has to be some function of its internal state. However, not all choices correspond to maximizing such a function—any time choices go in a circle, for instance, you’re not maximizing a function. We could imagine a very simple machine with a 3-state memory. It wants to go from A to B, and from B to C, and from C to A. Its choices are always a function if its internal state. But its choices don’t maximize a function of its internal state.
That’s not correct. For one thing, the agent may not believe what you say.
Okay. Replace “offer it a choice” with “offer it a choice, and provide sufficient Bayesian evidence that this is this choice faced.” This doesn’t lead anywhere anyhow.
not all choices correspond to maximizing such a function—any time choices go in a circle, for instance, you’re not maximizing a function. We could imagine a very simple machine with a 3-state memory. It wants to go from A to B, and from B to C, and from C to A. Its choices are always a function if its internal state. But its choices don’t maximize a function of its internal state.
Here’s the corresponding utility function—assuming that state transitions are tied to actions.
If IAM(A) { U(A) = 0, U(B) = 1 U(C) = 0; }
If IAM(B) { U(A) = 0, U(B) = 0 U(C) = 1; }
If IAM(C) { U(A) = 1, U(B) = 0 U(C) = 0; }
Using simple maximisation algorithms (e.g. gradient descent) on that utility landscape will produce the behaviour in question. More sophisticted algorithms will do no better.
For one thing, the agent may not believe what you say.
Okay. Replace “offer it a choice” with “offer it a choice, and provide sufficient Bayesian evidence that this is this choice faced.” This doesn’t lead anywhere anyhow.
Your “BartlebeyBot” agent totally ignored Bayesian evidence. By what rule does “my” example agent have to listen and respond to such evidence, while “yours” does not? Again, I don’t think your proposed counter example is remotely convincing.
Any function of the internal state can be expressed with a number of entries equal to the number of possible internal states.
You’ve given me something that’s still interesting, which is all the expected utilities.
By what rule does “my” example agent have to listen and respond to such evidence, while “yours” does not? Again, I don’t think your proposed counter example is remotely convincing.
Because one maximizes a utility function, and the other just says “no” all the time.
Why do you think there’s a counter-example? Did you read the referenced Dewey paper about O-Maximisers?
Thank you for linking that again. Hm, I guess I did assume that agents could have different utilities at different timesteps. Just putting “1” for everything resolves how an O-maximizer can refuse the offer to raise its utility. But then, they assume that the tape of a turing machine is infinite, so the cycle above still is a problem.
Following the links, at first glance it looks like there’s an argument there that anything with computable behavior will have behavior expressible as a utility function. Is that correct?
That doesn’t seem important—for the reason described here—where it says:
This is a highly questionable statement concerning humans, and the paper linked from that page doesn’t appear to prove it.
Edit: ah, this includes “functions” that anyone else would call a “stupidly complicated state machine” and which may not actually be feasible to calculate.
The term “function”—as used on the page—is a technical term with a clearly-established meaning.
Yes indeed, and the only way to fit that function to the human state machine is to include a “t” term, over the life of the human in question. Which is pretty much infeasible to calculate unless you invoke “and then a miracle occurs”.
Utility-based models are no more “infeasible to calculate” than any other model. Indeed you can convert any model of an agent into a utility-based model by an I/O-based “wrapper” of it—as described here. The idea that utility-based models of humans are more computationally intractible than other models is just wrong.
You keep repeating this Texas Sharpshooter Utility Function fallacy (earlier appearances in the link you gave, and here and here, of observing what the agent does, and retrospectively labelling that with utility 1 and everything else with utility 0. And as often as you do that, I will point out it’s a fallacy. Something that can only be computed after the action is known cannot be used before the fact to choose the action.
I was talking about wrapping a model of a human—thus converting a non-utility-based model into a utility-based one. That operation is, of course, not circular. If you think the argument is circular, you haven’t grasped the intended purpose of it.
It doesn’t give you a utility-based model. A model is a structure whose parts correspond to parts of the thing modelled, and which interact in the same way as in the thing modelled. This post-hoc utility function does not correspond to anything.
What next? Label with 1 everything that happens and 0 everything that doesn’t and call that a utliity-based model of the universe?
Here, I made it pretty clear from the beginning that I was starting with an existing model—and then modifying it. A model with a few bits strapped onto it is still a model.
If I stick a hamburger on my car, the car is still a car—but the hamburger plays no part in what makes it a car.
AFAICS, I never made the corresponding claim—that the utility function was part of what made the model a model.
How else can I understand your words “utility-based models”? This is no more a utility-based model than a hamburger on a car is a hamburger-based car.
Well, I would say “utilitarian”, but that word seems to be taken. I mean that the model calculates utilities associated with its possible actions—and then picks the action with the highest utility.
But that is exactly what this wrapping in a post-hoc utility function doesn’t do. The model first picks an action in whatever way it does, then labels that with utility 1.
The trouble, as usual, being that most of these descriptive utility functions are very complicated relative to the storage space we have available—they start out in the format of “one number for every possible history of the universe,” and don’t get compressed much from there.
That is not a problem. A compact utility-based description of an agent’s behaviour is only ever slightly longer than the shortest description of it available. It’s easy to show that by considering a utility-based “wrapper” around the shortest description.
That’s a good way to get effective expected utilities. But expected utilities aren’t utility functions. Hm, there may be a way to fix that that I haven’t noticed, though. But maybe not.
Your comment doesn’t seem very clear to me. Are you thinking that a “utility function” needs to have a specific domain which is not simply sensory contents and internal state? If so, do you have a reference for that notion?
I am at least claiming that in the context of designing a good AI, “utility function” should be taken to be a function of some external world, yes.
Otherwise you may run into problems. For example, you could offer to change a robot’s sensory contents and internal state to something with higher utility than its current state—and if the agent refuses, you will reset it. If we were using a “utility wrapper” model, all modeled agents would say yes. But the trivial example of an agent that always says “I would prefer not to” (BartlebeyBot) demonstrates that not all agents make choices that maximize some function of their internal state.
So: the only information available to any agent is in the form of its internal state and its sensory channels. Any function it computes must have that domain (or some subset of it). Confining the agent to that domain isn’t any kind of restriction. All utility functions calulated over the state of the world necessarily correspond to other utility functions calulated over the domain of internal state and sensory input.
Your example seems wrong to me. The problem is with:
That’s not correct. For one thing, the agent may not believe what you say.
Good point. So any function it computes has to be some function of its internal state. However, not all choices correspond to maximizing such a function—any time choices go in a circle, for instance, you’re not maximizing a function. We could imagine a very simple machine with a 3-state memory. It wants to go from A to B, and from B to C, and from C to A. Its choices are always a function if its internal state. But its choices don’t maximize a function of its internal state.
Okay. Replace “offer it a choice” with “offer it a choice, and provide sufficient Bayesian evidence that this is this choice faced.” This doesn’t lead anywhere anyhow.
Here’s the corresponding utility function—assuming that state transitions are tied to actions.
If IAM(A) { U(A) = 0, U(B) = 1 U(C) = 0; }
If IAM(B) { U(A) = 0, U(B) = 0 U(C) = 1; }
If IAM(C) { U(A) = 1, U(B) = 0 U(C) = 0; }
Using simple maximisation algorithms (e.g. gradient descent) on that utility landscape will produce the behaviour in question. More sophisticted algorithms will do no better.
Your “BartlebeyBot” agent totally ignored Bayesian evidence. By what rule does “my” example agent have to listen and respond to such evidence, while “yours” does not? Again, I don’t think your proposed counter example is remotely convincing.
Why do you think there’s a counter-example? Did you read the referenced Dewey paper about O-Maximisers?
Any function of the internal state can be expressed with a number of entries equal to the number of possible internal states.
You’ve given me something that’s still interesting, which is all the expected utilities.
Because one maximizes a utility function, and the other just says “no” all the time.
Thank you for linking that again. Hm, I guess I did assume that agents could have different utilities at different timesteps. Just putting “1” for everything resolves how an O-maximizer can refuse the offer to raise its utility. But then, they assume that the tape of a turing machine is infinite, so the cycle above still is a problem.
Following the links, at first glance it looks like there’s an argument there that anything with computable behavior will have behavior expressible as a utility function. Is that correct?
Yes.