The “everything humans can do, an AI could do better” argument cuts both ways. Humans can wirehead—machines may be able to wirehead better. That argument is pretty symmetric with the “wirehead avoidance” argument. So: I don’t think either argument is worth very much. There may be good arguments that illuminate the future frequency of wireheading, but these don’t qualify. It seems quite possible that our entire civilization could wirehead itself—along the lines suggested by David Pearce.
Everything a human can do, a human cannot do in the most extreme possible manner. An AI could be made to wirehead easier or harder. It could think faster or slower. It could be more creative or less creative. It could be nicer or meaner.
I wouldn’t begin to know how to build an AI that’s improved in all the right ways. It might not even be humanly possible. If it’s not humanly possible to build a good AI, it’s likely impossible for the AI to be able to improve on itself. There’s still a good chance that it would work.
Probably true—and few want wireheading machines—but the issues are the scale of the technical challenges, and—if these are non-trivial—how much folk will be prepared to pay for the feature. In a society of machines, maybe the occasional one that turns Buddhist—and needs to go back to the factory for psychological repairs—is within tolerable limits.
Many apparently think that making machines value “external reality” fixes the wirehead problem—e.g. see “Model-based Utility Functions”—but it leads directly to the problems of what you mean by “external reality” and how to tell a machine that that is what it is supposed to be valuing. It doesn’t look much like solving the problem to me.
The “everything humans can do, an AI could do better” argument cuts both ways. Humans can wirehead—machines may be able to wirehead better. That argument is pretty symmetric with the “wirehead avoidance” argument. So: I don’t think either argument is worth very much. There may be good arguments that illuminate the future frequency of wireheading, but these don’t qualify. It seems quite possible that our entire civilization could wirehead itself—along the lines suggested by David Pearce.
Everything a human can do, a human cannot do in the most extreme possible manner. An AI could be made to wirehead easier or harder. It could think faster or slower. It could be more creative or less creative. It could be nicer or meaner.
I wouldn’t begin to know how to build an AI that’s improved in all the right ways. It might not even be humanly possible. If it’s not humanly possible to build a good AI, it’s likely impossible for the AI to be able to improve on itself. There’s still a good chance that it would work.
Probably true—and few want wireheading machines—but the issues are the scale of the technical challenges, and—if these are non-trivial—how much folk will be prepared to pay for the feature. In a society of machines, maybe the occasional one that turns Buddhist—and needs to go back to the factory for psychological repairs—is within tolerable limits.
Many apparently think that making machines value “external reality” fixes the wirehead problem—e.g. see “Model-based Utility Functions”—but it leads directly to the problems of what you mean by “external reality” and how to tell a machine that that is what it is supposed to be valuing. It doesn’t look much like solving the problem to me.