turchin comments on Risks from Approximate Value Learning

turchin 28 Aug 2016 21:25 UTC
0 points
I don’t think that we are only arguing semantic, but the idea of scanning a human is not my only one idea and is not the best idea of AI safety. It is just interesting promising idea.

In one Russian short story a robot was asked to get rid of all circular objects in the room and the robot cut the owner’s head. But if the robot had a simulation of right moral human he could run it thousands times a second and check any his action with it.

The the first difference between sim and a human operator is that the sim can be run infinitely many more times and very cheaply, so we could check each robot action.

The second difference is that a human operator is changing (could be replaced by other operator or aging or suffer stroke). The sim is not changing with such speed (but he could still learn, or for simple situation fixed copies could be used)

The 3d is that sim is specially selected and trained for many years and later tested as a sim in many imaginary situations. We can’t test operator with such rigorous quality.

Four. The sim could control any infinitely small update of AI’s utility function, because it runs on the same speed. It prevents AI from becoming malicious. I have to explain it: our typical discussions about malicious AI have a logical loop (first shown by Ben Goertzel)

The loop: To start hide his utility function from peers, the AI should become malicious. But he can’t covertly become malicious without secretly changing his utility function. In other words, between the moment where AI decides to be be bad, and the moment when he start to act base on this decision, is a short time gap. This gap maybe very short in time like 0,01 sec, so no human operator has time to react. But sim will have time (especially if the AI will be halted after any update in his open utility function).

Another explanation of the gap: if all values of an AI are written in open from (in some stack), and he creates the value to hide his values, than the value “hide values” is also written in open form (for short time). So if the value stack is translated outside, there is no way for AI to hide his transformation from the peer. (except to kill him before he get all implications)

Five. The sim not only check values but also correct understanding of human language and context.