I think I’m lost, please bring me back on track- is the intention here to model the updating of moral beliefs after hearing an argument as a special case of updating beliefs concerning factual information about how the world works, rather than a change to the core utility function? (Where moral/preference uncertainty is analogous to uncertainty about how sore your teeth are)
Thanks for the clarification (I’m confused, he answered my question, why all the downvotes?)
If I understand correctly, your model depicts agents with underlying morals/preferences who are also uncertain about what those preferences are. It seems to me that the degrees of freedom allotted by this model allows these agents to exhibit VNM-irrational behavior even if the underlying underlying preferences are VNM consistent. Do you agree?
If you agree—Previously, you stated that you wouldn’t consider an agent to have a utility function unless it 1) behaved VNM rationally or 2) had an explicit utility function. The agents you are describing here seem to meet neither criteria, yet have a utility function. Do you still stand by your previous post?
your model depicts agents with underlying morals/preferences who are also uncertain about what those preferences are.
Yep.
It seems to me that the degrees of freedom allotted by this model allows these agents to exhibit VNM-irrational behavior even if the underlying underlying preferences are VNM consistent. Do you agree?
Not quite. The point was for the agent to be VNM rational in final actions. However, if you only looked at the behavior over the object-level outcome space, you would conclude VNM irrationality, but that’s because the agent is playing a higher level game.
Example, you may observe a father playing chess with his daughter, and notice that half way through the game, he starts deliberately making bad moves so she can win faster. If you looked at only the chessboard, you would conclude that he was irrational; he played to win for a while, then threw the game? The revealed preferences are not consistent.
However, if you step up a level, you may notice that she mentioned during the game that she had homework to do, after which he threw the game so that she could get to her homework faster, and get a little boost in motivation from winning. So when you look at this higher level that includes these concerns, he would be acting rational way.
So the thing I did in OP was to formalize this concept of facts that are outside the game that affect how you want to play the game, and applied that to moral uncertainty.
From the outside, though, how could you tell the difference?
Rational agent (yes utility function)
1) I acts a certain way
2) Irrelevant alternative appears,
spurring thought processes that cause an introspective insight, leading me to update my beliefs about what my preferences are.
3) I act in another way which is inconsistent with [1]
Irrational Agent (No utility function)
1) I acts a certain way
2) Irrelevant alternative appears.
3) I act in another way which is inconsistent with [1]
From the outside, how would you ever know if an agent is behaving rationally because of some entirely obscured update going on within [2] , or because the agent is in fact irrational and/or does not have a utilility function to begin with?
(And if you can’t know from the outside, there isn’t a difference, because utility functions are only meant as models for behavior, not descriptions of what goes on inside the black box)
Probably indistinguishable from the outside, except that because we are using bayes to update, theoretically we will get more and more accurate as we make big updates, such that big updates become less and less likely.
The point is we have a prescriptive way to make decisions in the presence of moral uncertainty that won’t do anything stupid.
Come to think of it, I totally forgot to show that this satisfies all our intuitions regarding how moral updating ought to behave. No wonder no one cares. Maybe I should write that up.
In that case, I’m still confused. Maybe this question will help.
To restate—I just described agents which (behaviorally speaking) appeared to violate one of the VNM axioms. They still qualify as sort-of VNM rational to you, because the weird behavior was a result of an update to what the agent thought its utility function was.
To remove this funny “I don’t know what my utility function is” business, let’s split our agent into two: Agent R is bounded-rational, and it’s utility function is simply “do what Agent M wants”. Agent M has a complex utility function of morality and teeth soreness, which is partially obscure to Agent R. Agent R makes evidence-based updates concerning both the outside world and M’s utility function. (Functionally, this is the same as having one agent which is unsure of what it’s utility function is, but it seems easier to talk about)
Am I still following you correctly?
So here is my question:
Are some humans contained in the outlined set of sort-of VNM compliant agents?
And if not, what quality excludes them from the set?
To remove this funny “I don’t know what my utility function is” business, let’s split our agent into two: Agent R is bounded-rational, and it’s utility function is simply “do what Agent M wants”. Agent M has a complex utility function of morality and teeth soreness, which is partially obscure to Agent R. Agent R makes evidence-based updates concerning both the outside world and M’s utility function. (Functionally, this is the same as having one agent which is unsure of what it’s utility function is, but it seems easier to talk about)
Am I still following you correctly?
Conceiving of it as two separate agents is a bit funny, but yeah, that’s more or less the right model.
I think of it as “I know what my utility function is, but the utility of outcomes depends on some important moral facts that I don’t know about.”
Are some humans contained in the outlined set of sort-of VNM compliant agents? And if not, what quality excludes them from the set?
No. I assert in “we don’t have a utility function” that we (all humans) do not have a utility function. Of course I could be wrong.
As I said, humans are excluded on both acting in a sane and consistent way, and on knowing what we even want.
Actually, in some sense, the question of whether X is a VNM agent is uninteresting. It’s like the question of whether X is a heat engine. If you twist things around enough, even a rock could be a heat engine or a VNM agent with zero efficiency or a preference for accelerating in the direction of gravity.
The point of VNM, and of Thermodynamics, is as analysis tools for analyzing systems that we are designing. Everything is a heat engine, but some are more efficient/usable than others. Likewise with agents, everything is an agent, but some produce outcomes that we like and others do not.
So with applying VNM to humans the question is not whether we descriptively have utility functions or whatever; the question is if and how we can use a VNM analysis to make useful changes in behavior, or how we can build a system that produces valuable outcomes.
So the point of this moral uncertainty business is “oh look, if we concieve of moral uncertainty like this, we can provably meet these criteria and solve those problems in a coherent way”.
A utility function is only a method of approximating an agent’s behavior. If I wanted to make a precise description, I wouldn’t bother “agent-izing” the object in the first place. The rock falls vs. the rock wants to fall is a meaningless distinction. In that sense, nothing “has a utility function”, since utility functions aren’t ontologically fundamental.
When I say “does X have a utility function”, I mean “Is it useful and intuitive to predict the behavior of X by ascribing agency to it and using a utility function”. So the real question is, do humans deviate from the model to such an extent that the model should not be used? It certainly doesn’t seem like the model describes anything else better than it describes humans—although as AI improves that might change.
So even if I agree that humans don’t technically “have a utility function” anymore than any other object, I would say that if anything on this planet is worth ascribing agency and using a utility function to describe, it’s animals. So if humans and other animals don’t have a utility function, who does?
So if humans and other animals don’t have a utility function, who does?
No one yet. We’re working on it.
So the real question is, do humans deviate from the model to such an extent that the model should not be used?
Yes. You will find it much more fruitful to predict most humans as causal systems (including youself), and if you wanted to model human behavior with a utility function, you’d either have a lot of error, or a lot of trouble adding enough epicycles.
As I said though, VNM isn’t useful descriptively; if you use it like that, it’s tautological, and doesn’t really tell you anything. Where it shines is in design of agenty systems; “If we had these preferences, what would that imply about where we would steer the future” (which worlds are ranked high) “if we want to steer the future over there, what decision architecture do we need?”.
I don’t know either (ie. it wasn’t me) but perhaps “Yes. That is exactly what I meant.” would work better with a quoted sentence that reveals the ‘that’ in question?
I think I’m lost, please bring me back on track- is the intention here to model the updating of moral beliefs after hearing an argument as a special case of updating beliefs concerning factual information about how the world works, rather than a change to the core utility function? (Where moral/preference uncertainty is analogous to uncertainty about how sore your teeth are)
Yes. That is exactly what I meant.
Thanks for the clarification (I’m confused, he answered my question, why all the downvotes?)
If I understand correctly, your model depicts agents with underlying morals/preferences who are also uncertain about what those preferences are. It seems to me that the degrees of freedom allotted by this model allows these agents to exhibit VNM-irrational behavior even if the underlying underlying preferences are VNM consistent. Do you agree?
If you agree—Previously, you stated that you wouldn’t consider an agent to have a utility function unless it 1) behaved VNM rationally or 2) had an explicit utility function. The agents you are describing here seem to meet neither criteria, yet have a utility function. Do you still stand by your previous post?
That’s confusing me as well.
Yep.
Not quite. The point was for the agent to be VNM rational in final actions. However, if you only looked at the behavior over the object-level outcome space, you would conclude VNM irrationality, but that’s because the agent is playing a higher level game.
Example, you may observe a father playing chess with his daughter, and notice that half way through the game, he starts deliberately making bad moves so she can win faster. If you looked at only the chessboard, you would conclude that he was irrational; he played to win for a while, then threw the game? The revealed preferences are not consistent.
However, if you step up a level, you may notice that she mentioned during the game that she had homework to do, after which he threw the game so that she could get to her homework faster, and get a little boost in motivation from winning. So when you look at this higher level that includes these concerns, he would be acting rational way.
So the thing I did in OP was to formalize this concept of facts that are outside the game that affect how you want to play the game, and applied that to moral uncertainty.
From the outside, though, how could you tell the difference?
Rational agent (yes utility function)
1) I acts a certain way
2) Irrelevant alternative appears, spurring thought processes that cause an introspective insight, leading me to update my beliefs about what my preferences are.
3) I act in another way which is inconsistent with [1]
Irrational Agent (No utility function)
1) I acts a certain way
2) Irrelevant alternative appears.
3) I act in another way which is inconsistent with [1]
From the outside, how would you ever know if an agent is behaving rationally because of some entirely obscured update going on within [2] , or because the agent is in fact irrational and/or does not have a utilility function to begin with?
(And if you can’t know from the outside, there isn’t a difference, because utility functions are only meant as models for behavior, not descriptions of what goes on inside the black box)
Probably indistinguishable from the outside, except that because we are using bayes to update, theoretically we will get more and more accurate as we make big updates, such that big updates become less and less likely.
The point is we have a prescriptive way to make decisions in the presence of moral uncertainty that won’t do anything stupid.
Come to think of it, I totally forgot to show that this satisfies all our intuitions regarding how moral updating ought to behave. No wonder no one cares. Maybe I should write that up.
In that case, I’m still confused. Maybe this question will help.
To restate—I just described agents which (behaviorally speaking) appeared to violate one of the VNM axioms. They still qualify as sort-of VNM rational to you, because the weird behavior was a result of an update to what the agent thought its utility function was.
To remove this funny “I don’t know what my utility function is” business, let’s split our agent into two: Agent R is bounded-rational, and it’s utility function is simply “do what Agent M wants”. Agent M has a complex utility function of morality and teeth soreness, which is partially obscure to Agent R. Agent R makes evidence-based updates concerning both the outside world and M’s utility function. (Functionally, this is the same as having one agent which is unsure of what it’s utility function is, but it seems easier to talk about)
Am I still following you correctly?
So here is my question:
Are some humans contained in the outlined set of sort-of VNM compliant agents? And if not, what quality excludes them from the set?
Conceiving of it as two separate agents is a bit funny, but yeah, that’s more or less the right model.
I think of it as “I know what my utility function is, but the utility of outcomes depends on some important moral facts that I don’t know about.”
No. I assert in “we don’t have a utility function” that we (all humans) do not have a utility function. Of course I could be wrong.
As I said, humans are excluded on both acting in a sane and consistent way, and on knowing what we even want.
Actually, in some sense, the question of whether X is a VNM agent is uninteresting. It’s like the question of whether X is a heat engine. If you twist things around enough, even a rock could be a heat engine or a VNM agent with zero efficiency or a preference for accelerating in the direction of gravity.
The point of VNM, and of Thermodynamics, is as analysis tools for analyzing systems that we are designing. Everything is a heat engine, but some are more efficient/usable than others. Likewise with agents, everything is an agent, but some produce outcomes that we like and others do not.
So with applying VNM to humans the question is not whether we descriptively have utility functions or whatever; the question is if and how we can use a VNM analysis to make useful changes in behavior, or how we can build a system that produces valuable outcomes.
So the point of this moral uncertainty business is “oh look, if we concieve of moral uncertainty like this, we can provably meet these criteria and solve those problems in a coherent way”.
OK, I think we’re on the same page now.
A utility function is only a method of approximating an agent’s behavior. If I wanted to make a precise description, I wouldn’t bother “agent-izing” the object in the first place. The rock falls vs. the rock wants to fall is a meaningless distinction. In that sense, nothing “has a utility function”, since utility functions aren’t ontologically fundamental.
When I say “does X have a utility function”, I mean “Is it useful and intuitive to predict the behavior of X by ascribing agency to it and using a utility function”. So the real question is, do humans deviate from the model to such an extent that the model should not be used? It certainly doesn’t seem like the model describes anything else better than it describes humans—although as AI improves that might change.
So even if I agree that humans don’t technically “have a utility function” anymore than any other object, I would say that if anything on this planet is worth ascribing agency and using a utility function to describe, it’s animals. So if humans and other animals don’t have a utility function, who does?
No one yet. We’re working on it.
Yes. You will find it much more fruitful to predict most humans as causal systems (including youself), and if you wanted to model human behavior with a utility function, you’d either have a lot of error, or a lot of trouble adding enough epicycles.
As I said though, VNM isn’t useful descriptively; if you use it like that, it’s tautological, and doesn’t really tell you anything. Where it shines is in design of agenty systems; “If we had these preferences, what would that imply about where we would steer the future” (which worlds are ranked high) “if we want to steer the future over there, what decision architecture do we need?”.
I don’t know either (ie. it wasn’t me) but perhaps “Yes. That is exactly what I meant.” would work better with a quoted sentence that reveals the ‘that’ in question?