I only said that it would reduce chance of stupid decisions resulting from not understanding basic human words and values. But it would not reduce chances of deliberately malicious AI.
There are (at least) two different type of UFAI: real UFAI and failed FAI. Failed FAI wanted to be good but failed, and the best example of it smile maximizer which will cover all Solar system with smiles. (Paperclip maximizer also is some form of failed FAI, as initial goal was positive—produce many paperclips)
So it is not full recipe for real FAI, but just one way of value learning
I’m still not sure I understand you correctly. I suspect that if we follow this to the end, we will discover that we are only arguing semantics, and don’t actually disagree over anything tangible. If that’s your impression too, please say so, and we’ll both save ourselves some time.
I only said that it would reduce chance of stupid decisions
I wouldn’t disagree that having such an operator is better than not having one. I am questioning the value of having the operator uploaded. Why would programing an AI to care about the operator’s values and not manipulate the operator be easier if the operator is uploaded? Wouldn’t the operator just be manipulated even faster?
The only answer I see to that is that the uploading part is just to provide a faster and better user interface. If value loading was done via a game of 20 billion questions, for example, this would take an impractically long time. (Thousands of years, if just one person at a time is answering questions.) Same goes if the AI learns values via machine learning, using rewards and punishments given out by the operator, although you’d still have to keep it from wire-heading by manipulating the operator. Also, as an interesting aside, it may be easier to pull values directly out of someone’s brain.
If we’re only arguing about semantics, however, I have a guess at the source:
I understand “failed FAI” to be something like a pure smile maximizer, which has just as much incentive to route around human operators as a paperclip maximizer or suffering maximizer. It wouldn’t care about our values any more than we care about what sorts of values evolution tried to give us. The unstated assumption here is that value uploading failed or never happened, and the AI is no longer trying to load values, but only implement the values it has. I believe this is what you’re gesturing toward with “real UFAI”.
Do you understand “failed FAI” to be one which simply misunderstood our values, like a smile maximizer, but which never exited the value loading phase? This sort of AI might have some sort of “uncertainty” about it’s utility function. If so, it might still care about what values we intended to give it.
I don’t think that we are only arguing semantic, but the idea of scanning a human is not my only one idea and is not the best idea of AI safety. It is just interesting promising idea.
In one Russian short story a robot was asked to get rid of all circular objects in the room and the robot cut the owner’s head. But if the robot had a simulation of right moral human he could run it thousands times a second and check any his action with it.
The the first difference between sim and a human operator is that the sim can be run infinitely many more times and very cheaply, so we could check each robot action.
The second difference is that a human operator is changing (could be replaced by other operator or aging or suffer stroke). The sim is not changing with such speed (but he could still learn, or for simple situation fixed copies could be used)
The 3d is that sim is specially selected and trained for many years and later tested as a sim in many imaginary situations. We can’t test operator with such rigorous quality.
Four. The sim could control any infinitely small update of AI’s utility function, because it runs on the same speed. It prevents AI from becoming malicious. I have to explain it: our typical discussions about malicious AI have a logical loop (first shown by Ben Goertzel)
The loop:
To start hide his utility function from peers, the AI should become malicious. But he can’t covertly become malicious without secretly changing his utility function.
In other words, between the moment where AI decides to be be bad, and the moment when he start to act base on this decision, is a short time gap. This gap maybe very short in time like 0,01 sec, so no human operator has time to react. But sim will have time (especially if the AI will be halted after any update in his open utility function).
Another explanation of the gap: if all values of an AI are written in open from (in some stack), and he creates the value to hide his values, than the value “hide values” is also written in open form (for short time). So if the value stack is translated outside, there is no way for AI to hide his transformation from the peer. (except to kill him before he get all implications)
Five. The sim not only check values but also correct understanding of human language and context.
You confuse the stupidity of whoever set the goals with the stupidity of the AI afterward. Any AGI is going to understand what we actually want, it just doesn’t care, if the goal it was given wasn’t already smart enough.
I only said that it would reduce chance of stupid decisions resulting from not understanding basic human words and values. But it would not reduce chances of deliberately malicious AI.
There are (at least) two different type of UFAI: real UFAI and failed FAI. Failed FAI wanted to be good but failed, and the best example of it smile maximizer which will cover all Solar system with smiles. (Paperclip maximizer also is some form of failed FAI, as initial goal was positive—produce many paperclips)
So it is not full recipe for real FAI, but just one way of value learning
I’m still not sure I understand you correctly. I suspect that if we follow this to the end, we will discover that we are only arguing semantics, and don’t actually disagree over anything tangible. If that’s your impression too, please say so, and we’ll both save ourselves some time.
I wouldn’t disagree that having such an operator is better than not having one. I am questioning the value of having the operator uploaded. Why would programing an AI to care about the operator’s values and not manipulate the operator be easier if the operator is uploaded? Wouldn’t the operator just be manipulated even faster?
The only answer I see to that is that the uploading part is just to provide a faster and better user interface. If value loading was done via a game of 20 billion questions, for example, this would take an impractically long time. (Thousands of years, if just one person at a time is answering questions.) Same goes if the AI learns values via machine learning, using rewards and punishments given out by the operator, although you’d still have to keep it from wire-heading by manipulating the operator. Also, as an interesting aside, it may be easier to pull values directly out of someone’s brain.
If we’re only arguing about semantics, however, I have a guess at the source:
I understand “failed FAI” to be something like a pure smile maximizer, which has just as much incentive to route around human operators as a paperclip maximizer or suffering maximizer. It wouldn’t care about our values any more than we care about what sorts of values evolution tried to give us. The unstated assumption here is that value uploading failed or never happened, and the AI is no longer trying to load values, but only implement the values it has. I believe this is what you’re gesturing toward with “real UFAI”.
Do you understand “failed FAI” to be one which simply misunderstood our values, like a smile maximizer, but which never exited the value loading phase? This sort of AI might have some sort of “uncertainty” about it’s utility function. If so, it might still care about what values we intended to give it.
I don’t think that we are only arguing semantic, but the idea of scanning a human is not my only one idea and is not the best idea of AI safety. It is just interesting promising idea.
In one Russian short story a robot was asked to get rid of all circular objects in the room and the robot cut the owner’s head. But if the robot had a simulation of right moral human he could run it thousands times a second and check any his action with it.
The the first difference between sim and a human operator is that the sim can be run infinitely many more times and very cheaply, so we could check each robot action.
The second difference is that a human operator is changing (could be replaced by other operator or aging or suffer stroke). The sim is not changing with such speed (but he could still learn, or for simple situation fixed copies could be used)
The 3d is that sim is specially selected and trained for many years and later tested as a sim in many imaginary situations. We can’t test operator with such rigorous quality.
Four. The sim could control any infinitely small update of AI’s utility function, because it runs on the same speed. It prevents AI from becoming malicious. I have to explain it: our typical discussions about malicious AI have a logical loop (first shown by Ben Goertzel)
The loop: To start hide his utility function from peers, the AI should become malicious. But he can’t covertly become malicious without secretly changing his utility function. In other words, between the moment where AI decides to be be bad, and the moment when he start to act base on this decision, is a short time gap. This gap maybe very short in time like 0,01 sec, so no human operator has time to react. But sim will have time (especially if the AI will be halted after any update in his open utility function).
Another explanation of the gap: if all values of an AI are written in open from (in some stack), and he creates the value to hide his values, than the value “hide values” is also written in open form (for short time). So if the value stack is translated outside, there is no way for AI to hide his transformation from the peer. (except to kill him before he get all implications)
Five. The sim not only check values but also correct understanding of human language and context.
You confuse the stupidity of whoever set the goals with the stupidity of the AI afterward. Any AGI is going to understand what we actually want, it just doesn’t care, if the goal it was given wasn’t already smart enough.