“If a superintelligence starts out with a friendly top goal, however, then it can be relied on to stay friendly, or at least not to deliberately rid itself of its friendliness. This point is elementary. A âfriendâ who seeks to transform himself into somebody who wants to hurt you, is not your friend.”
Well, that depends on the wirehead problem—and it is certainly not elementary. The problem is with the whole idea that there may be something such as a “friendly top goal” in the first place.
The idea that a fully self-aware powerful agent that has access to its own internals can be made to intrinsically have environment-related goals—or any other kind of external referents—is a challenging and difficult one—and success at doing this has yet to be convincingly demonstrated. It is possible—if you “wall off” bits of the superintelligence—but then you have the problem of the superintelligence finding ways around the walls.
Well, that depends on the wirehead problem—and it is certainly not elementary. The problem is with the whole idea that there may be something such as a “friendly top goal” in the first place.
The idea that a fully self-aware powerful agent that has access to its own internals can be made to intrinsically have environment-related goals—or any other kind of external referents—is a challenging and difficult one—and success at doing this has yet to be convincingly demonstrated. It is possible—if you “wall off” bits of the superintelligence—but then you have the problem of the superintelligence finding ways around the walls.