One of the problems is that “roughly”. Depending on the effectiveness acquired by said AI, a “roughly equivalent values” could be a much worse outcome than “completely alien values”. FAI is difficult in part because values are very fragile.
I disagree that values are fragile in that way. One sign of this is that human beings themselves only have roughly equivalent values, and that doesn’t make the world any more dreadful than it actually is.
The classical example is a value “I want to see all people happy”, and the machine goes on grafting smiles on the faces of everyone. On the other hand, I’ve explicitely speculated that an AI should acquire much more power than a usual human being has to become dangerous. Would you give the control of the entire nuclear arsenal to any single human being, since her values would be roughly equivalent to yours?
The smiley face thing is silly, and nothing like that has any danger of happening.
The concern about power is more reasonable. However, if a single human being had control of the entire nuclear arsenal (of the world), there would less danger from nuclear weapons than in the actual situation, since given that one person controls them all, nuclear war is not possible, whereas in the actual situation, it is possible and will happen sooner or later, given an annual probability which is not constantly diminishing (which it currently is not.)
Your real concern about that situation is not nuclear war, since that would not be possible. Your concern is that a single human might make himself the dictator of the world, and might behave in ways that others find unpleasant. That is quite possible, but it would not destroy the world, nor it would make life unbearable for the majority of human beings.
If we look far enough into the future, the world will indeed be controlled by things which are both smarter and more powerful than human beings. And they will certainly have values that differ, to some extent, from your personal values. So if you could look into a crystal ball and see that situation, I don’t doubt that you would object to it. It does not change the fact that the world will not be destroyed in that situation, nor will life be unbearable for most people.
something being “silly” is not really a refutation.
In this case, it is. Interpreting happiness in that way would be stupid, not in the sense of doing something bad, but in the sense of doing something extremely unintelligent. We are supposedly talking about something more intelligent than humans, not much, much less.
Ah, I see where the catch is here. You presupposes that ‘intelligent’ already contains ‘human’ as subdomain, so that anything that is intelligent by definition can understand the subtext of any human interaction. I think that the purpose of part of LW and part of the Sequence is to show that intelligence in this domain should be deconstructed as “optimization power”, which carries more a neutral connotation. The point of contention, as I see it and as the whole FAI problem presupposes, is that it’s infinitely easier to create an agent with high optimization power and low ‘intelligence’ (as you understand the term), rather than high OP and high intelligence.
Eliezer’s response to my argument would be that “the genie knows, but does not care.” So he would disagree with you: it understands the subtext quite well. The problem with his answer, of course, is that it implies that the AI knows that happiness does not mean pasting smiley faces, but wants to paste smiley faces anyway. This will not happen, because values are learned progressively. They are not fixed at one arbitrary stage.
In a sufficiently broad sense of “in principle” you can separate optimization from intelligence. For example, a giant lookup table can optimize, but it is not intelligent. In a similar way, AIXI can optimize, but it is probably not intelligent. But note that neither a GLUT nor an AIXI is possible in the real world.
In the real world, optimization power cannot be separated from intelligence. The reason for this is that nothing will be able to optimize, without having general concepts with which to understand the world. These general concepts will necessarily be learned in a human context, given that we are talking about an AI programmed by humans. So their conceptual schema, and consequently their values, will roughly match ours.
… by a casual exploration of the genetic landscape.”
I expect something similar with AI. AIs created by humans and raised in human environments will have values roughly matching those environments.
One of the problems is that “roughly”. Depending on the effectiveness acquired by said AI, a “roughly equivalent values” could be a much worse outcome than “completely alien values”.
FAI is difficult in part because values are very fragile.
I disagree that values are fragile in that way. One sign of this is that human beings themselves only have roughly equivalent values, and that doesn’t make the world any more dreadful than it actually is.
The classical example is a value “I want to see all people happy”, and the machine goes on grafting smiles on the faces of everyone.
On the other hand, I’ve explicitely speculated that an AI should acquire much more power than a usual human being has to become dangerous. Would you give the control of the entire nuclear arsenal to any single human being, since her values would be roughly equivalent to yours?
The smiley face thing is silly, and nothing like that has any danger of happening.
The concern about power is more reasonable. However, if a single human being had control of the entire nuclear arsenal (of the world), there would less danger from nuclear weapons than in the actual situation, since given that one person controls them all, nuclear war is not possible, whereas in the actual situation, it is possible and will happen sooner or later, given an annual probability which is not constantly diminishing (which it currently is not.)
Your real concern about that situation is not nuclear war, since that would not be possible. Your concern is that a single human might make himself the dictator of the world, and might behave in ways that others find unpleasant. That is quite possible, but it would not destroy the world, nor it would make life unbearable for the majority of human beings.
If we look far enough into the future, the world will indeed be controlled by things which are both smarter and more powerful than human beings. And they will certainly have values that differ, to some extent, from your personal values. So if you could look into a crystal ball and see that situation, I don’t doubt that you would object to it. It does not change the fact that the world will not be destroyed in that situation, nor will life be unbearable for most people.
It’s nice to see that you are so confident, but something being “silly” is not really a refutation.
In this case, it is. Interpreting happiness in that way would be stupid, not in the sense of doing something bad, but in the sense of doing something extremely unintelligent. We are supposedly talking about something more intelligent than humans, not much, much less.
Ah, I see where the catch is here. You presupposes that ‘intelligent’ already contains ‘human’ as subdomain, so that anything that is intelligent by definition can understand the subtext of any human interaction.
I think that the purpose of part of LW and part of the Sequence is to show that intelligence in this domain should be deconstructed as “optimization power”, which carries more a neutral connotation.
The point of contention, as I see it and as the whole FAI problem presupposes, is that it’s infinitely easier to create an agent with high optimization power and low ‘intelligence’ (as you understand the term), rather than high OP and high intelligence.
Eliezer’s response to my argument would be that “the genie knows, but does not care.” So he would disagree with you: it understands the subtext quite well. The problem with his answer, of course, is that it implies that the AI knows that happiness does not mean pasting smiley faces, but wants to paste smiley faces anyway. This will not happen, because values are learned progressively. They are not fixed at one arbitrary stage.
In a sufficiently broad sense of “in principle” you can separate optimization from intelligence. For example, a giant lookup table can optimize, but it is not intelligent. In a similar way, AIXI can optimize, but it is probably not intelligent. But note that neither a GLUT nor an AIXI is possible in the real world.
In the real world, optimization power cannot be separated from intelligence. The reason for this is that nothing will be able to optimize, without having general concepts with which to understand the world. These general concepts will necessarily be learned in a human context, given that we are talking about an AI programmed by humans. So their conceptual schema, and consequently their values, will roughly match ours.