If you, the reader, or, say, Paul Christiano or Eliezer gets uploaded and obtains self-improvement, self-modification and processing speed/power capabilities, will your goals converge to damaging humanity as well? If not, what makes it different? How can we transfer this secret sauce to an AI agent?
The Orthogonality Thesis states that values and capabilities can vary independently. The key question then is whether my/Paul’s/Eliezer’s values are actually as aligned with humanity as they appear to be, or if instead we are already unaligned and would perform a Treacherous Turn once we had the power to get away with it. There are certainly people who are already obviously bad choices, and people who would perform the Treacherous Turn (possibly most people[1]), but I believe there are people who are sufficiently aligned, so let’s assume going forward we’ve picked one of those. At this point “If not, what makes it different?” answers itself: by assumption we’ve picked a person for whom the Value Loading Problem is already solved. But we have no idea how to “transfer this secret sauce to an AI agent”—the secret sauce is hidden somewhere along this person’s particular upbringing and more importantly their multi-billion year evolutionary history.
The adage “power tends to corrupt, and absolute power corrupts absolutely” basically says that treacherous turns are commonplace for humans—we claim to be aligned and might even believe it ourselves while we are weak, but then when we get power we abuse it. This adage existing does not of course mean it’s universally true.
The adage “power tends to corrupt, and absolute power corrupts absolutely” basically says that treacherous turns are commonplace for humans—we claim to be aligned and might even believe it ourselves while we are weak, but then when we get power we abuse it.
I would like to know the true answer to this.
On one hand, some people are assholes, and often it’s just a fear of punishment or social disapproval that stops them. Remove all this feedback, and it’s probably not going to end well. (Furthermore, a percent or two of the population are literally psychopaths.)
On the other hand, people who have power are usually not selected randomly, but through a process where the traits that later cause the “treacherous turn” may be helpful; quite likely they already had to betray people repeatedly in order to get to the top. (Then the adage kinda reduces to “people selected for evil traits are evil”; no shit Sherlock. This doesn’t say much about the average person.) Also, having “power” often requires continuously defending it from all kinds of enemies, which can make a person paranoid and aggressive. (When in doubt, attack first, because if you keep ignoring situations with 1% chance to hurt you, your fall is just a question of time.)
I don’t know if we have a sample of people who somehow got power without actively fighting for it, and who felt safe at keeping it. How did they behave, on average?
I think the insights from Selectorate Theory imply that it is impossible to keep power without gradually growing more corrupt, in the sense of appeasing the “Winning Coalition” with private goods. No matter what your terminal goals, more power, and power kept for longer, is a convergent instrumental goal, and one which usually takes so much effort to achieve that you gradually lose sight of your terminal goals too, compromising ethics in the short term in the name of an “ends justify the means” long term (which often never arrives).
So yeah, I think that powerful humans are unaligned by default, as our ancestors who rejected all attempts to form hierarchies for tens of thousands of years before finally succumbing to the first nationstates may attest.
The Orthogonality Thesis states that values and capabilities can vary independently. The key question then is whether my/Paul’s/Eliezer’s values are actually as aligned with humanity as they appear to be, or if instead we are already unaligned and would perform a Treacherous Turn once we had the power to get away with it. There are certainly people who are already obviously bad choices, and people who would perform the Treacherous Turn (possibly most people[1]), but I believe there are people who are sufficiently aligned, so let’s assume going forward we’ve picked one of those. At this point “If not, what makes it different?” answers itself: by assumption we’ve picked a person for whom the Value Loading Problem is already solved. But we have no idea how to “transfer this secret sauce to an AI agent”—the secret sauce is hidden somewhere along this person’s particular upbringing and more importantly their multi-billion year evolutionary history.
The adage “power tends to corrupt, and absolute power corrupts absolutely” basically says that treacherous turns are commonplace for humans—we claim to be aligned and might even believe it ourselves while we are weak, but then when we get power we abuse it. This adage existing does not of course mean it’s universally true.
I would like to know the true answer to this.
On one hand, some people are assholes, and often it’s just a fear of punishment or social disapproval that stops them. Remove all this feedback, and it’s probably not going to end well. (Furthermore, a percent or two of the population are literally psychopaths.)
On the other hand, people who have power are usually not selected randomly, but through a process where the traits that later cause the “treacherous turn” may be helpful; quite likely they already had to betray people repeatedly in order to get to the top. (Then the adage kinda reduces to “people selected for evil traits are evil”; no shit Sherlock. This doesn’t say much about the average person.) Also, having “power” often requires continuously defending it from all kinds of enemies, which can make a person paranoid and aggressive. (When in doubt, attack first, because if you keep ignoring situations with 1% chance to hurt you, your fall is just a question of time.)
I don’t know if we have a sample of people who somehow got power without actively fighting for it, and who felt safe at keeping it. How did they behave, on average?
I think the insights from Selectorate Theory imply that it is impossible to keep power without gradually growing more corrupt, in the sense of appeasing the “Winning Coalition” with private goods. No matter what your terminal goals, more power, and power kept for longer, is a convergent instrumental goal, and one which usually takes so much effort to achieve that you gradually lose sight of your terminal goals too, compromising ethics in the short term in the name of an “ends justify the means” long term (which often never arrives).
So yeah, I think that powerful humans are unaligned by default, as our ancestors who rejected all attempts to form hierarchies for tens of thousands of years before finally succumbing to the first nationstates may attest.
Seems like there are two meanings of “power” that get conflated, because in real life it is a combination of both:
to be able to do whatever you want;
to successfully balance the interests of others, so that you can stay nominally on the top.
Good point. Perhaps there’s some people who would be corrupted by the realities of human politics, but not by e.g. ascension to superintelligence.