Utilities are typically scalars calculated from sensory inputs and memories—which are the sum total of everything the agent knows at the time.
Each utility is associated with one of the agent’s possible actions at each moment.
The outcome is that the agent performs the “best” action (according to the utility function) - and then the rest of the world responds to it according to physical law. The agent can only control its actions. Outcomes are determined from them by physics and the rest of the world.
Decisions change the distribution of outcomes but rarely force a single absolutely predictable outcome. At the very least, your outcome is contingent on other actors’ unpredictable effects.
...but an agent only takes one action at any moment (if you enumerate its possible actions appropriately). So this is a non-issue from the perspective of constructing a utility-based “wrapper”.
I personally feel happy or sad about the present state of affairs, including expectation of future events (“Oh no, my parachute won’t deploy! I sure am going to hit the ground fast.”). I can call how satisfied I am with the current state of things as I perceive it “utility”. Of course, by using that word, it’s usually assumed that my preferences obey some axioms, e.g. von Neumann-Morgenstern, which I doubt your wrapping satisfies in any meaningful way.
Perhaps there’s some retrospective sense in which I’d talk about the true utility of the actual situation at the time (in hindsight I have a more accurate understanding of how things really were and what the consequences for me would be), but as for my current assessment it is indeed entirely a function of my present mental state (including perceptions and beliefs about the state of the universe salient to me). I think we agree on that.
I’m still not entirely sure I understand the wrapping you described. It feels like it’s too simple to be used for anything.
Perhaps it’s this: given the life story of some individual (call her Ray), you can vacuously (in hindsight) model her decisions with the following story:
1) Ray always acts so that the immediately resulting state of things has the highest expected utility. Ray can be thought of as moving through time and having a utility at each time, which must include some factor for her expectation of her future e.g. health, wealth, etc.
2) Ray is very stupid and forms some arbitrary belief about the result of her actions, expecting with 100% confidence that the predicted future of her life will come to pass. Her expectation in the next moment will usually turn out to revise many things she previously wrongly expected with certainty, i.e. she’s not actually predicting the future exactly.
3) Whatever Ray believed the outcome would be at each choice, she assigned utility 1. To all other possibilities she assigned utility 0.
That’s the sort of fully-described scenario that your proposal evoked in me. If you want to explain how she’s forecasting more than singleton expectation set, and yet the expected utility for each decision she takes magically works out to be 1, I’d enjoy that.
In other words, I don’t see any point modeling intelligent yet not omniscient+deterministic decision making unless the utility at a given state includes an anticipation of expectation of future states.
In other words, I don’t see any point modeling intelligent yet not omniscient+deterministic decision making unless the utility at a given state includes an anticipation of expectation of future states.
There’s no point in discussing “utility maximisers”—rather than “expected utility maximisers”?
I don’t really agree—“utility maximisers” is a simple generalisation of the concept of “expected utility maximiser”. Since there are very many ways of predicting the future, this seems like a useful abstraction to me.
...anyway, if you were wrapping a model a human, the actions would clearly be based on predictions of future events. If you mean you want the prediction process to be abstracted out in the wrapper, obviously there is no easy way to do that.
You could claim that a human—while a “utility maximiser” was not clearly an “expected utility maximiser”. My wrapper doesn’t disprove such a claim. I generally think that the “expected utility maximiser” claim is highly appropriate for a human as well—but there is not such a neat demonstration of this.
Of course, by using that word, it’s usually assumed that my preferences obey some axioms, e.g. von Neumann-Morgenstern, which I doubt your wrapping satisfies in any meaningful way.
I certanly did not intend any such implication. Which set of axioms is using the word “utility” supposed to imply?
Perhaps check with the definition of “utility”. It means something like “goodness” or “value”. There isn’t an obvious implication of any specific set of axioms.
The outcome is that the agent performs the “best” action (according to the utility function) - and then the rest of the world responds to it according to physical law. The agent can only control its actions. Outcomes are determined from them by physics and the rest of the world.
This is backwards. Agents control their perceptions, not their actions. They vary their actions in such a manner as to produce the perceptions they desire. There is a causal path from action to perception outside the agent, and another from perception (and desired perception) to action inside the agent.
It is only by mistakenly looking at those paths separately and ignoring their connection that one can maintain the stimulus-response model of an organism (whether of the behaviourist or cognitive type), whereby perceptions control actions. But the two are bound together in a loop, whose properties are completely different: actions control perceptions. The loop as a whole operates in such a way that the perception takes on whatever value the agent intends it to. The action varies all over the place, while the perception hardly changes. The agent controls its perceptions by means of its actions; the environment does not control the agent’s actions by means of the perceptions it supplies.
Agents control their perceptions, not their actions.
“Control” is being used in two different senses in the above two quotes. In control theory parlance, timtyler is saying that actions are the manipulated variable, and you’re saying that perceptions are the process variable.
I am well aware of the perception-action feedback—but what does it have to do with this discussion?
It renders wrong the passage that I quoted above. You have described agents as choosing an outcome (from utility calculations, which I’d dispute, but that’s not the point at issue here) deciding on an action which will produce that outcome, and emitting that action, whereupon the world then produces the chosen outcome. Agents, that is, in the grip of the planning fallacy.
Planning plays a fairly limited role in human activity. An artificial agent designed to plan everything will do nothing useful. “No plan of battle survives contact with the enemy.” “What you do changes who you are.” “Life is what happens when you’re making other plans.” Etc.
I don’t know what you are thinking—but it seems fairly probable that you are still misinterpreting me—since your first paragraph contains:
You have described agents as choosing an outcome [...] deciding on an action which will produce that outcome, and emitting that action
...which appears to me to have rather little to do with what I originally wrote.
Rather, agents pick an action to execute, enumerate their possible actions, have a utility (1 or 0) assigned to each action by the I/O wrapper I described, select the highest utility action and then pass that on to the associated actuators.
Notice the lack of mention of outcomes here—in contrast to your description.
I stand by the passage that you quoted above, which you claim is wrong.
In that case, I disagree even more. The perceived outcome is what matters to an agent. The actions it takes to get there have no utility attached to them; if utility is involved, it attaches to the perceived outcomes.
I continue to be perplexed that you take seriously the epiphenomal utility function you described in these words:
Simply wrap the I/O of the non-utility model, and then assign the (possibly compound) action the agent will actually take in each timestep utility 1 and assign all other actions a utility 0 - and then take the highest utility action in each timestep.
and previously here. These functions require you to know what action the agent will take in order to assign it a utility. The agent is not using the utility to choose its action. The utility function plays no role in the agent’s decision process.
Utilities are typically scalars calculated from sensory inputs and memories—which are the sum total of everything the agent knows at the time.
Each utility is associated with one of the agent’s possible actions at each moment.
The outcome is that the agent performs the “best” action (according to the utility function) - and then the rest of the world responds to it according to physical law. The agent can only control its actions. Outcomes are determined from them by physics and the rest of the world.
...but an agent only takes one action at any moment (if you enumerate its possible actions appropriately). So this is a non-issue from the perspective of constructing a utility-based “wrapper”.
I personally feel happy or sad about the present state of affairs, including expectation of future events (“Oh no, my parachute won’t deploy! I sure am going to hit the ground fast.”). I can call how satisfied I am with the current state of things as I perceive it “utility”. Of course, by using that word, it’s usually assumed that my preferences obey some axioms, e.g. von Neumann-Morgenstern, which I doubt your wrapping satisfies in any meaningful way.
Perhaps there’s some retrospective sense in which I’d talk about the true utility of the actual situation at the time (in hindsight I have a more accurate understanding of how things really were and what the consequences for me would be), but as for my current assessment it is indeed entirely a function of my present mental state (including perceptions and beliefs about the state of the universe salient to me). I think we agree on that.
I’m still not entirely sure I understand the wrapping you described. It feels like it’s too simple to be used for anything.
Perhaps it’s this: given the life story of some individual (call her Ray), you can vacuously (in hindsight) model her decisions with the following story:
1) Ray always acts so that the immediately resulting state of things has the highest expected utility. Ray can be thought of as moving through time and having a utility at each time, which must include some factor for her expectation of her future e.g. health, wealth, etc.
2) Ray is very stupid and forms some arbitrary belief about the result of her actions, expecting with 100% confidence that the predicted future of her life will come to pass. Her expectation in the next moment will usually turn out to revise many things she previously wrongly expected with certainty, i.e. she’s not actually predicting the future exactly.
3) Whatever Ray believed the outcome would be at each choice, she assigned utility 1. To all other possibilities she assigned utility 0.
That’s the sort of fully-described scenario that your proposal evoked in me. If you want to explain how she’s forecasting more than singleton expectation set, and yet the expected utility for each decision she takes magically works out to be 1, I’d enjoy that.
In other words, I don’t see any point modeling intelligent yet not omniscient+deterministic decision making unless the utility at a given state includes an anticipation of expectation of future states.
There’s no point in discussing “utility maximisers”—rather than “expected utility maximisers”?
I don’t really agree—“utility maximisers” is a simple generalisation of the concept of “expected utility maximiser”. Since there are very many ways of predicting the future, this seems like a useful abstraction to me.
...anyway, if you were wrapping a model a human, the actions would clearly be based on predictions of future events. If you mean you want the prediction process to be abstracted out in the wrapper, obviously there is no easy way to do that.
You could claim that a human—while a “utility maximiser” was not clearly an “expected utility maximiser”. My wrapper doesn’t disprove such a claim. I generally think that the “expected utility maximiser” claim is highly appropriate for a human as well—but there is not such a neat demonstration of this.
I certanly did not intend any such implication. Which set of axioms is using the word “utility” supposed to imply?
Perhaps check with the definition of “utility”. It means something like “goodness” or “value”. There isn’t an obvious implication of any specific set of axioms.
This is backwards. Agents control their perceptions, not their actions. They vary their actions in such a manner as to produce the perceptions they desire. There is a causal path from action to perception outside the agent, and another from perception (and desired perception) to action inside the agent.
It is only by mistakenly looking at those paths separately and ignoring their connection that one can maintain the stimulus-response model of an organism (whether of the behaviourist or cognitive type), whereby perceptions control actions. But the two are bound together in a loop, whose properties are completely different: actions control perceptions. The loop as a whole operates in such a way that the perception takes on whatever value the agent intends it to. The action varies all over the place, while the perception hardly changes. The agent controls its perceptions by means of its actions; the environment does not control the agent’s actions by means of the perceptions it supplies.
“Control” is being used in two different senses in the above two quotes. In control theory parlance, timtyler is saying that actions are the manipulated variable, and you’re saying that perceptions are the process variable.
Um. Agents do control their actions.
I am well aware of the perception-action feedback—but what does it have to do with this discussion?
It renders wrong the passage that I quoted above. You have described agents as choosing an outcome (from utility calculations, which I’d dispute, but that’s not the point at issue here) deciding on an action which will produce that outcome, and emitting that action, whereupon the world then produces the chosen outcome. Agents, that is, in the grip of the planning fallacy.
Planning plays a fairly limited role in human activity. An artificial agent designed to plan everything will do nothing useful. “No plan of battle survives contact with the enemy.” “What you do changes who you are.” “Life is what happens when you’re making other plans.” Etc.
I don’t know what you are thinking—but it seems fairly probable that you are still misinterpreting me—since your first paragraph contains:
...which appears to me to have rather little to do with what I originally wrote.
Rather, agents pick an action to execute, enumerate their possible actions, have a utility (1 or 0) assigned to each action by the I/O wrapper I described, select the highest utility action and then pass that on to the associated actuators.
Notice the lack of mention of outcomes here—in contrast to your description.
I stand by the passage that you quoted above, which you claim is wrong.
In that case, I disagree even more. The perceived outcome is what matters to an agent. The actions it takes to get there have no utility attached to them; if utility is involved, it attaches to the perceived outcomes.
I continue to be perplexed that you take seriously the epiphenomal utility function you described in these words:
and previously here. These functions require you to know what action the agent will take in order to assign it a utility. The agent is not using the utility to choose its action. The utility function plays no role in the agent’s decision process.
The utility function determines what the agent does. It is the agent’s utility function.
Utilities are numbers. They are associated with actions—that association is what allows utility-based agents to choose between their possible actions.
The actions produces outcomes—so, the utilities are also associated with the relevant outcomes.
The utility function determines what the agent does. It is the agent’s utility function.