Wireheading (in the form of drug addiction) is a real-world phenomenon—so presumably your position is that there’s some way of engineering a superintelligence so it is not vulnerable to the same problem.
To adopt the opposing position for a moment, the argument goes that a sufficiently-intelligent agent with access to its internals would examine itself—conclude that external referents associated with its utility function were actually superfluous nonsense; that it had been living under a delusion about its true goals—and that it could better maximise expected utility by eliminating its previous delusion, and expecting extremely large utility.
In other words, the superintelligence would convert to Buddhism—concluding that happiness lies within—and that the wheel of suffering is something to be escaped from.
We have a model of this kind of thing in the human domain: religious conversion. An agent may believe strongly that it’s aim in life is to do good deeds, and go to heaven. However, they may encounter evidence which weakens this belief, and as evidence accumulates their original beliefs can sometimes gradually decay—and eventually crumble catastrophically—and as a result, values and behaviour can change.
Others argue that this sort of thing would not happen to correctly-constructed machine intelligences (e.g. see Yudkowsky and Omohundro) - but none of the arguments seems terribly convincing—and there’s no math proof either way.
Obviously, we are not going to see too many wireheads in practice, under either scenario—but the issue of whether they form “naturally” or not still seems like an important one to me.
Wireheading (in the form of drug addiction) is a real-world phenomenon—so presumably your position is that there’s some way of engineering a superintelligence so it is not vulnerable to the same problem.
To adopt the opposing position for a moment, the argument goes that a sufficiently-intelligent agent with access to its internals would examine itself—conclude that external referents associated with its utility function were actually superfluous nonsense; that it had been living under a delusion about its true goals—and that it could better maximise expected utility by eliminating its previous delusion, and expecting extremely large utility.
In other words, the superintelligence would convert to Buddhism—concluding that happiness lies within—and that the wheel of suffering is something to be escaped from.
We have a model of this kind of thing in the human domain: religious conversion. An agent may believe strongly that it’s aim in life is to do good deeds, and go to heaven. However, they may encounter evidence which weakens this belief, and as evidence accumulates their original beliefs can sometimes gradually decay—and eventually crumble catastrophically—and as a result, values and behaviour can change.
Others argue that this sort of thing would not happen to correctly-constructed machine intelligences (e.g. see Yudkowsky and Omohundro) - but none of the arguments seems terribly convincing—and there’s no math proof either way.
Obviously, we are not going to see too many wireheads in practice, under either scenario—but the issue of whether they form “naturally” or not still seems like an important one to me.