It’s valuing external reality. Valuing sensory inputs and mental models would just result in wireheading.
Thinking they are valuing “external reality” probably doesn’t really protect agents from wireheading. The agents just wind up with delusional ideas about what “external reality” consists of—built of the patchwork of underspecification left by the original programmers of this concept.
I know that it’s possible for an agent that’s created with a completely underspecified idea of reality to nonetheless value external reality and avoid wireheading. I know this because I am such an agent.
Everything humans can do, an AI could do. There’s little reason to believe humans are remotely optimum, so an AI could likely do it better.
The “everything humans can do, an AI could do better” argument cuts both ways. Humans can wirehead—machines may be able to wirehead better. That argument is pretty symmetric with the “wirehead avoidance” argument. So: I don’t think either argument is worth very much. There may be good arguments that illuminate the future frequency of wireheading, but these don’t qualify. It seems quite possible that our entire civilization could wirehead itself—along the lines suggested by David Pearce.
Everything a human can do, a human cannot do in the most extreme possible manner. An AI could be made to wirehead easier or harder. It could think faster or slower. It could be more creative or less creative. It could be nicer or meaner.
I wouldn’t begin to know how to build an AI that’s improved in all the right ways. It might not even be humanly possible. If it’s not humanly possible to build a good AI, it’s likely impossible for the AI to be able to improve on itself. There’s still a good chance that it would work.
Probably true—and few want wireheading machines—but the issues are the scale of the technical challenges, and—if these are non-trivial—how much folk will be prepared to pay for the feature. In a society of machines, maybe the occasional one that turns Buddhist—and needs to go back to the factory for psychological repairs—is within tolerable limits.
Many apparently think that making machines value “external reality” fixes the wirehead problem—e.g. see “Model-based Utility Functions”—but it leads directly to the problems of what you mean by “external reality” and how to tell a machine that that is what it is supposed to be valuing. It doesn’t look much like solving the problem to me.
Thinking they are valuing “external reality” probably doesn’t really protect agents from wireheading. The agents just wind up with delusional ideas about what “external reality” consists of—built of the patchwork of underspecification left by the original programmers of this concept.
I know that it’s possible for an agent that’s created with a completely underspecified idea of reality to nonetheless value external reality and avoid wireheading. I know this because I am such an agent.
Everything humans can do, an AI could do. There’s little reason to believe humans are remotely optimum, so an AI could likely do it better.
The “everything humans can do, an AI could do better” argument cuts both ways. Humans can wirehead—machines may be able to wirehead better. That argument is pretty symmetric with the “wirehead avoidance” argument. So: I don’t think either argument is worth very much. There may be good arguments that illuminate the future frequency of wireheading, but these don’t qualify. It seems quite possible that our entire civilization could wirehead itself—along the lines suggested by David Pearce.
Everything a human can do, a human cannot do in the most extreme possible manner. An AI could be made to wirehead easier or harder. It could think faster or slower. It could be more creative or less creative. It could be nicer or meaner.
I wouldn’t begin to know how to build an AI that’s improved in all the right ways. It might not even be humanly possible. If it’s not humanly possible to build a good AI, it’s likely impossible for the AI to be able to improve on itself. There’s still a good chance that it would work.
Probably true—and few want wireheading machines—but the issues are the scale of the technical challenges, and—if these are non-trivial—how much folk will be prepared to pay for the feature. In a society of machines, maybe the occasional one that turns Buddhist—and needs to go back to the factory for psychological repairs—is within tolerable limits.
Many apparently think that making machines value “external reality” fixes the wirehead problem—e.g. see “Model-based Utility Functions”—but it leads directly to the problems of what you mean by “external reality” and how to tell a machine that that is what it is supposed to be valuing. It doesn’t look much like solving the problem to me.