In general, agents often have an incentive to reveal their preferences as little as possible, in order to exploit the information asymmetry.
I think that’s only the case in competitive games, not cooperative ones. (ISTM the optimal amount of information to reveal would be zero in the zero-sum-game limit and everything you know (neglecting the cost of communication itself etc.) in the identical-payoff-matrices limit.)
I think that’s only the case in competitive games, not cooperative ones.
A problem I continue to run into in real life is, “how do you keep people from wire-heading their preferences whenever they find themselves in a positive-sum game, so that they can play a zero-sum version instead?”
Original preference—“I just want a car to get to work.”
Environmental change: “Here, everyone gets a car for free.”
Adjusted preference—“Okay, then what I REALLY want is a faster car than anyone else has.”
...
Original preference—“I just want to be able to eat.”
Environmental change: “Here, there’s enough food to go around forever.”
Adjusted preference—“Okay, then what I REALLY want is for me to eat while those guys have to watch, starving.”
...
Original preference—“I just want to feel safe.”
Environmental change: “Here, you’re in a space where everyone is your ally and no one can hurt you.”
Adjusted preference—“Okay, then what I REALLY want is for us to all gang up on the people who made us feel unsafe, and make THEM feel persecuted for a change.”
Original preference—“I just want to be able to eat.”
Environmental change: “Here, there’s enough food to go around forever.”
Adjusted preference—“Okay, then what I REALLY want is for me to eat while those guys have to watch, starving.”
You mean “What I REALLY want is to eat better than everybody else”, surely. Gourmet food or organic or hand-prepared etc. etc.
Unless this is intended to imply the current famines etc. are the product of a conspiracy and not civilizational inadequacy, in which case yes, that would of course be evidence that my model of human nature is wrong an yours is the correct one, if it’s true.
The way you formulate this opposes the idea that the “adjusted” preference was actually the preference all along, and the originally stated preference was simply an incorrect description of the system’s actual preferences. Is that deliberate, or just an incidental artifact of your phrasing?
It’s an artifact of my phrasing. In my experience, people do truly want good things, until those things become universally available—at which point they switch goals to something zero-sum. When they do so, they often phrase it themselves as if they really wanted the zero-sum thing all along, but that’s often a part of trying to distance themselves from their lower-status past.
Of course, I’m describing something that I only have personal and anecdotal evidence for; I’d REALLY like to be pointed towards either a legitimate, peer-reviewed description of a cognitive bias that would explain what I’m observing. (And I’d be at least equally happy if it turned out to be my cognitive bias that’s causing me to perceive people in this way.)
In my experience, people do truly want good things, until those things become universally available—at which point they switch goals to something zero-sum.
What would you expect to experience differently if, instead, people truly want zero-sum things, but they claim to want good things until the universal availability of good things makes that claim untenable?
I’ll need some time to think on this. This might just be my tendency to find the most charitable interpretation, even if other interpretations might be more parsimonious.
When they do so, they often phrase it themselves as if they really wanted the zero-sum thing all along, but that’s often a part of trying to distance themselves from their lower-status past.
So it was deliberate, and not an artifact of your phrasing. Did you perhaps misread the grandparent?
(ISTM the optimal amount of information to reveal would be zero in the zero-sum-game limit and everything you know (neglecting the cost of communication itself etc.) in the identical-payoff-matrices limit.)
Interestingly, ISTM that is itself a Prisoner’s Dilemma: the agent that doesn’t reveal it’s (true) preferences has a much, much better chance of manipulating an agent that does.
I think that’s only the case in competitive games, not cooperative ones. (ISTM the optimal amount of information to reveal would be zero in the zero-sum-game limit and everything you know (neglecting the cost of communication itself etc.) in the identical-payoff-matrices limit.)
A problem I continue to run into in real life is, “how do you keep people from wire-heading their preferences whenever they find themselves in a positive-sum game, so that they can play a zero-sum version instead?”
What does “wire-head a preference” mean?
Rewrite your utility function.
Examples:
Original preference—“I just want a car to get to work.”
Environmental change: “Here, everyone gets a car for free.”
Adjusted preference—“Okay, then what I REALLY want is a faster car than anyone else has.”
...
Original preference—“I just want to be able to eat.”
Environmental change: “Here, there’s enough food to go around forever.”
Adjusted preference—“Okay, then what I REALLY want is for me to eat while those guys have to watch, starving.”
...
Original preference—“I just want to feel safe.”
Environmental change: “Here, you’re in a space where everyone is your ally and no one can hurt you.”
Adjusted preference—“Okay, then what I REALLY want is for us to all gang up on the people who made us feel unsafe, and make THEM feel persecuted for a change.”
You mean “What I REALLY want is to eat better than everybody else”, surely. Gourmet food or organic or hand-prepared etc. etc.
Unless this is intended to imply the current famines etc. are the product of a conspiracy and not civilizational inadequacy, in which case yes, that would of course be evidence that my model of human nature is wrong an yours is the correct one, if it’s true.
The way you formulate this opposes the idea that the “adjusted” preference was actually the preference all along, and the originally stated preference was simply an incorrect description of the system’s actual preferences. Is that deliberate, or just an incidental artifact of your phrasing?
It’s an artifact of my phrasing. In my experience, people do truly want good things, until those things become universally available—at which point they switch goals to something zero-sum. When they do so, they often phrase it themselves as if they really wanted the zero-sum thing all along, but that’s often a part of trying to distance themselves from their lower-status past.
Of course, I’m describing something that I only have personal and anecdotal evidence for; I’d REALLY like to be pointed towards either a legitimate, peer-reviewed description of a cognitive bias that would explain what I’m observing. (And I’d be at least equally happy if it turned out to be my cognitive bias that’s causing me to perceive people in this way.)
What would you expect to experience differently if, instead, people truly want zero-sum things, but they claim to want good things until the universal availability of good things makes that claim untenable?
I’ll need some time to think on this. This might just be my tendency to find the most charitable interpretation, even if other interpretations might be more parsimonious.
What do you mean by “trully want”? See the phenomenon Eliezer describes here.
I intend the phrase to refer to whatever ialdabaoth meant by it when I quoted them.
So it was deliberate, and not an artifact of your phrasing. Did you perhaps misread the grandparent?
Interestingly, ISTM that is itself a Prisoner’s Dilemma: the agent that doesn’t reveal it’s (true) preferences has a much, much better chance of manipulating an agent that does.
If you know that the game is zero-sum then you usually already know all the other player preferences.