Agreed—but I think that power corrupts because being put in a position of power triggers its own set of motivational drives evolved for exploiting that power. I think that if an AI wasn’t built with such drives, power wouldn’t need to corrupt it.
I don’t think it’s necessary to posit any separate motivational drives. Once you’re in a position where cooperation isn’t necessary for getting what you want, then there’s no incentive to cooperate or shape yourself to desire cooperative things.
It’s rare-to-nonexistent in a society as large and interconnected as ours for anyone to be truly powerful enough that there’s no incentive to cooperate, but we can look at what people do when they don’t perceive a benefit to taking on (in part) other people’s values as their own. Sure, sometimes we see embezzlement and sleeping with subordinates which look like they’d correlate with “maximizing reproductive fitness in EEA”, but we also see a lot of bosses who are just insufferable in ways that make them less effective at their job to their detriment. We see power tripping cops and security guards, people being dicks to waiters, and without the “power over others” but retaining “no obvious socializing forces” you get road rage and twitter behavior.
The explanation that looks to me to fit better is just that people stop becoming socialized as soon as there’s no longer a strong perceived force rewarding them for doing so and punishing them for failing to. When people lose the incentive to refactor their impulses, they just act on them. Sometimes that means they’ll offer a role in a movie for sexual favors, but sometimes that means completely ignoring the people you’re supposed to be serving at the DMV or taking out your bitterness on the waiter who can’t do shit about it.
Once you’re in a position where cooperation isn’t necessary for getting what you want, then there’s no incentive to cooperate or shape yourself to desire cooperative things.
That seems to assume that you don’t put any intrinsic value on being cooperative?
I don’t think people do, in general. Not as any sort of separate instinctual terminal value somehow patched into our utility function before we’re born.
It can be learned, and well socialized people tend to learn it to some extent or another, but young kids are sure selfish and short sighted. And people placed in situations where it’s not obvious to them why cooperation is in their best interest don’t tend to act like they intrinsically value being cooperative. That’s not to say I think people are consciously tallying everything and waiting for a moment to ditch the cooperative BS to go do what they really want to do. I mean, that’s obviously a thing too, but there’s more than that.
People can learn to fake caring, but people can also learn to genuinely care about other people—in the kind of way where they will do good by the people they care about even when given the power not to. It’s not that their utility function is made up of a “selfish” set consisting of god knows what, and then a term for “cooperation” is added. It’s that we start out with short sighted impulses like “stay warm, get fed”, and along the way we build a more coherent structure of desires by making trades along the way of the sorts “I value patience over one marshmallow now, and receive two marshmallows in the future” and “You care a bit about my things, and I’ll care a bit about yours”. We start out ineffective shits that can only cry when our immediate impulses aren’t met, and can end up people who will voluntarily go cold and hungry even without temptation or suffering in order to provide for the wellbeing of our friends and family—not because we reason that this is the best way to stay warm in each moment, but that we have learned to not care so much whether we’re a little cold now and then relative to the long term wellbeing of our friends and family.
What I’m saying is that at the point where a person gains sufficient power over reality that they no longer have to deceive others in order to gain support and avoid punishment, the development of their desires will stop and their behaviors will be best predicted by the trades they actually made. If they managed to fake their whole way there from childhood, you will get childish behavior and childish goals. To the extent that they’ve only managed to succeed and acquire power by changing what they care about to be prosocial, power will not corrupt.
I do think there are hard-wired Little Glimpses of Empathy, as Steven Byrnes calls them that get the cooperation game started. I think these can have different strengths for different people, but they are just one of the many rewards (“stay warm, get fed” etc.) that we have and thus often not the most important factor—esp. at the edge of high power where you can get more of the other.
Yes, that is one of the avenues worth exploring. I doubt it scales to high levels of optimization power, but maybe simulations can measure the degree to which it does.
Agreed—but I think that power corrupts because being put in a position of power triggers its own set of motivational drives evolved for exploiting that power. I think that if an AI wasn’t built with such drives, power wouldn’t need to corrupt it.
I don’t think it’s necessary to posit any separate motivational drives. Once you’re in a position where cooperation isn’t necessary for getting what you want, then there’s no incentive to cooperate or shape yourself to desire cooperative things.
It’s rare-to-nonexistent in a society as large and interconnected as ours for anyone to be truly powerful enough that there’s no incentive to cooperate, but we can look at what people do when they don’t perceive a benefit to taking on (in part) other people’s values as their own. Sure, sometimes we see embezzlement and sleeping with subordinates which look like they’d correlate with “maximizing reproductive fitness in EEA”, but we also see a lot of bosses who are just insufferable in ways that make them less effective at their job to their detriment. We see power tripping cops and security guards, people being dicks to waiters, and without the “power over others” but retaining “no obvious socializing forces” you get road rage and twitter behavior.
The explanation that looks to me to fit better is just that people stop becoming socialized as soon as there’s no longer a strong perceived force rewarding them for doing so and punishing them for failing to. When people lose the incentive to refactor their impulses, they just act on them. Sometimes that means they’ll offer a role in a movie for sexual favors, but sometimes that means completely ignoring the people you’re supposed to be serving at the DMV or taking out your bitterness on the waiter who can’t do shit about it.
That seems to assume that you don’t put any intrinsic value on being cooperative?
I don’t think people do, in general. Not as any sort of separate instinctual terminal value somehow patched into our utility function before we’re born.
It can be learned, and well socialized people tend to learn it to some extent or another, but young kids are sure selfish and short sighted. And people placed in situations where it’s not obvious to them why cooperation is in their best interest don’t tend to act like they intrinsically value being cooperative. That’s not to say I think people are consciously tallying everything and waiting for a moment to ditch the cooperative BS to go do what they really want to do. I mean, that’s obviously a thing too, but there’s more than that.
People can learn to fake caring, but people can also learn to genuinely care about other people—in the kind of way where they will do good by the people they care about even when given the power not to. It’s not that their utility function is made up of a “selfish” set consisting of god knows what, and then a term for “cooperation” is added. It’s that we start out with short sighted impulses like “stay warm, get fed”, and along the way we build a more coherent structure of desires by making trades along the way of the sorts “I value patience over one marshmallow now, and receive two marshmallows in the future” and “You care a bit about my things, and I’ll care a bit about yours”. We start out ineffective shits that can only cry when our immediate impulses aren’t met, and can end up people who will voluntarily go cold and hungry even without temptation or suffering in order to provide for the wellbeing of our friends and family—not because we reason that this is the best way to stay warm in each moment, but that we have learned to not care so much whether we’re a little cold now and then relative to the long term wellbeing of our friends and family.
What I’m saying is that at the point where a person gains sufficient power over reality that they no longer have to deceive others in order to gain support and avoid punishment, the development of their desires will stop and their behaviors will be best predicted by the trades they actually made. If they managed to fake their whole way there from childhood, you will get childish behavior and childish goals. To the extent that they’ve only managed to succeed and acquire power by changing what they care about to be prosocial, power will not corrupt.
I do think there are hard-wired Little Glimpses of Empathy, as Steven Byrnes calls them that get the cooperation game started. I think these can have different strengths for different people, but they are just one of the many rewards (“stay warm, get fed” etc.) that we have and thus often not the most important factor—esp. at the edge of high power where you can get more of the other.
Yes, that is one of the avenues worth exploring. I doubt it scales to high levels of optimization power, but maybe simulations can measure the degree to which it does.