Game theory. If different groups compete in building a “friendly” AI that respects only their personal extrapolated coherent violation (extrapolated sensible desires) then cooperation is no longer an option because the other teams have become “the enemy”. I have a value system that is substantially different from Eliezer’s. I don’t want a friendly AI that is created in some researcher’s personal image (except, of course, if it’s created based on my ideals). This means that we have to sabotage each other’s work to prevent the other researchers to get to friendly AI first. This is because the moment somebody reaches “friendly” AI the game is over and all parties except for one lose. And if we get uFAI everybody loses.
That’s a real problem though. If different fractions in friendly AI research have to destructively compete with each other, then the probability of unfriendly AI will increase. That’s real bad. From a game theory perspective all FAI researchers agree that any version of FAI is preferable to uFAI, and yet they’re working towards a future where uFAI is becoming more and more likely! Luckily, if the FAI researchers take the coherent extrapolated violation of all of humanity the problem disappears. All FAI researchers can work to a common goal that will fairly represent all of humanity, not some specific researcher’s version of “FAI”. It also removes the problem of different morals/values. Some people believe that we should look at total utility, other people believe we should consider only average utility. Some people believe abstract values matter, some people believe consequences of actions matter most. Here too the solution of an AI that looks at a representative set of all human values is the solution that all people can agree on as most “fair”. Cooperation beats defection.
If Luke were to attempt to create a LukeFriendlyAI he knows he’s defecting from the game theoretical optimal strategy and thereby increasing the probability of a world with uFAI. If Luke is aware of this and chooses to continue on that course anyway then he’s just become another uFAI researcher who actively participates in the destruction of the human species (to put it dramatically).
We can’t force all AI programmers to focus on the FAI route. We can try to raise the sanity waterline and try to explain to AI researchers that the optimal (game theoretically speaking) strategy is the one we ought to pursue because it’s most likely to lead to a fair FAI based on all of our human values. We just have to cooperate, despite differences in beliefs and moral values. CEV is the way to accomplish that because it doesn’t privilege the AI researchers who write the code.
Game Theory only helps us if it’s impossible to deceive others. If one is able to engage in deception, the dominant strategy becomes to pretend to support CEV FAI while actually working on your own personal God in a jar. AI development in particular seems an especially susceptible domain for deception. The creation of a working AI is a one time event, it’s not like most stable games in nature which allow one to detect defections of hundreds of iterations. The creation of a working AI (FAI or uFAI) is so complicated that it’s impossible for others to check if any given researcher is defecting or not.
Our best hope then is for the AI project to be so big it cannot be controlled by a single entity and definitely not by a single person. If it only takes guy in a basement getting lucky to make an AI go FOOM, we’re doomed. If it takes ten thousand researchers collaborating in the biggest group coding project ever, we’re probably safe. This is why doing work on CEV is so important. So we can have that piece of the puzzle already built when the rest of AI research catches up and is ready to go FOOM.
As I understand the terminology, AI that only respects some humans’ preferences is uFAI by definition. Thus:
a friendly AI that is created in some researcher’s personal image
is actually unFriendly, as Eliezer uses the term. Thus, the researcher you describe is already an “uFAI researcher”
It also removes the problem of different morals/values. Some people believe that we should look at total utility, other people believe we should consider only average utility. Some people believe abstract values matter, some people believe consequences of actions matter most. Here too the solution of an AI that looks at a representative set of all human values is the solution that all people can agree on as most “fair”.
What do you mean by “representative set of all human values”? Is there any reason to that the resulting moral theory would be acceptable to implement on everyone?
[a “friendly” AI] is actually unFriendly, as Eliezer uses the term
Absolutely. I used “friendly” AI (with scare quotes) to denote it’s not really FAI, but I don’t know if there’s a better term for it. It’s not the same as uFAI because Eliezer’s personal utopia is not likely to be valueless by my standards, whereas a generic uFAI is terrible from any human point of view (paperclip universe, etc).
I guess it just doesn’t bother me that uFAI includes both indifferent AI and malicious AI. I honestly think that indifferent AI is much more likely than malicious (Clippy is malicious, but awfully unlikely), but that’s not good for humanity’s future either.
Game theory. If different groups compete in building a “friendly” AI that respects only their personal extrapolated coherent violation (extrapolated sensible desires) then cooperation is no longer an option because the other teams have become “the enemy”. I have a value system that is substantially different from Eliezer’s. I don’t want a friendly AI that is created in some researcher’s personal image (except, of course, if it’s created based on my ideals). This means that we have to sabotage each other’s work to prevent the other researchers to get to friendly AI first. This is because the moment somebody reaches “friendly” AI the game is over and all parties except for one lose. And if we get uFAI everybody loses.
That’s a real problem though. If different fractions in friendly AI research have to destructively compete with each other, then the probability of unfriendly AI will increase. That’s real bad. From a game theory perspective all FAI researchers agree that any version of FAI is preferable to uFAI, and yet they’re working towards a future where uFAI is becoming more and more likely! Luckily, if the FAI researchers take the coherent extrapolated violation of all of humanity the problem disappears. All FAI researchers can work to a common goal that will fairly represent all of humanity, not some specific researcher’s version of “FAI”. It also removes the problem of different morals/values. Some people believe that we should look at total utility, other people believe we should consider only average utility. Some people believe abstract values matter, some people believe consequences of actions matter most. Here too the solution of an AI that looks at a representative set of all human values is the solution that all people can agree on as most “fair”. Cooperation beats defection.
If Luke were to attempt to create a LukeFriendlyAI he knows he’s defecting from the game theoretical optimal strategy and thereby increasing the probability of a world with uFAI. If Luke is aware of this and chooses to continue on that course anyway then he’s just become another uFAI researcher who actively participates in the destruction of the human species (to put it dramatically).
We can’t force all AI programmers to focus on the FAI route. We can try to raise the sanity waterline and try to explain to AI researchers that the optimal (game theoretically speaking) strategy is the one we ought to pursue because it’s most likely to lead to a fair FAI based on all of our human values. We just have to cooperate, despite differences in beliefs and moral values. CEV is the way to accomplish that because it doesn’t privilege the AI researchers who write the code.
Game Theory only helps us if it’s impossible to deceive others. If one is able to engage in deception, the dominant strategy becomes to pretend to support CEV FAI while actually working on your own personal God in a jar. AI development in particular seems an especially susceptible domain for deception. The creation of a working AI is a one time event, it’s not like most stable games in nature which allow one to detect defections of hundreds of iterations. The creation of a working AI (FAI or uFAI) is so complicated that it’s impossible for others to check if any given researcher is defecting or not.
Our best hope then is for the AI project to be so big it cannot be controlled by a single entity and definitely not by a single person. If it only takes guy in a basement getting lucky to make an AI go FOOM, we’re doomed. If it takes ten thousand researchers collaborating in the biggest group coding project ever, we’re probably safe. This is why doing work on CEV is so important. So we can have that piece of the puzzle already built when the rest of AI research catches up and is ready to go FOOM.
This doesn’t apply to all of humanity, just to AI researchers good enough to pose a threat.
As I understand the terminology, AI that only respects some humans’ preferences is uFAI by definition. Thus:
is actually unFriendly, as Eliezer uses the term. Thus, the researcher you describe is already an “uFAI researcher”
What do you mean by “representative set of all human values”? Is there any reason to that the resulting moral theory would be acceptable to implement on everyone?
Absolutely. I used “friendly” AI (with scare quotes) to denote it’s not really FAI, but I don’t know if there’s a better term for it. It’s not the same as uFAI because Eliezer’s personal utopia is not likely to be valueless by my standards, whereas a generic uFAI is terrible from any human point of view (paperclip universe, etc).
I guess it just doesn’t bother me that uFAI includes both indifferent AI and malicious AI. I honestly think that indifferent AI is much more likely than malicious (Clippy is malicious, but awfully unlikely), but that’s not good for humanity’s future either.