GeneSmith—when people in AI alignment or LessWrong talk about ‘wireheading’, I understood that not to refer to people literally asking neurosurgeons to stick wires into their brains, but rather to a somewhat larger class of ways to hack one’s own reward systems through the usual perceptual input channels.
I agree that humans are not ‘reward maximizing agents’, whatever that is supposed to mean in reference to actual evolved organisms with diverse, heterogenous, & domain-specific motivational systems.
I don’t think I explained my thinking clearly enough.
If using wireheading to refer to the broader class of actions that increase reward at the cost of maximizing reproductive fitness, I agree humans in general do wirehead to some degree. But even if we count taking recreational drugs or masturbation as wireheading, I still don’t believe any other theory of values explains the relative lack of these behaviors as well as shard theory.
If humans were truly reward maximizers, it’s difficult to imagine how they would manage to avoid wireheading as well as they do. I suppose perhaps the “thousand genetic hacks” theory might be able to explain it if evolution was clever enough? There’s certainly some evidence that when humans are exposed to new sources of reward that do nothing to benefit reproductive fitness, it’s often a disaster. See the numerous cases of hunter-gatherer peoples being exposed to alcohol for the first time.
But again… think about the actual wireheading example. There must be millions of humans that know about wireheading, yet so far as I know there are zero examples of people doing so recreationally. There was nothing similar to wireheading in the ancestral environment. Yet nearly everyone seems aversive to the idea of literally wireheading themselves.
Why? Humans can anticipate how incredibly rewarding it would be to wirehead. And many humans could afford full time caretakers to ensure they would be able to experience the rewards for years or decades. So why are people aversive to the idea?
My interpretation is that humans develop their initial shards of value during childhood in an environment that usually contains limited opportunities to wirehead. As the human world model becomes generally filled in, it becomes the case that most sensory environments activate at least one shard, whose “values” are not aligned with wireheading.
I think shard theory has a better explanation of this relative lack of wireheading than alternative models. But it’s obviously incomplete without a description of the free parameters that can be tweaked by genetics to produce the distribution of behaviors we see in the human population.
This is why I am hopeful others will start working on establishing the mathematics of shard theory so that we can see if shards really do form in neural networks, and if so how they behave.
GeneSmith—I guess I’m still puzzled about how Shard Theory prevents wireheading (broadly construed); I just don’t see it as a magic bullet that can keep agents focused on their ultimate goals. I must be missing something.
And, insofar as Shard Theory is supposed to be an empirically accurate description of human agents, it would need to explain why some people become fentanyl addicts who might eventually overdose, and others don’t. Or why some people pursue credentials and careers at the cost of staying childless… while others settle down young, have six kids, and don’t worry as much about status-seeking. Or why some people take up free solo mountain climbing, for the rush, and fall to their deaths by age 30, whereas others are more risk-averse.
Modern consumerist capitalism offers thousands of ways to ‘wirehead’ our reward systems, that don’t require experimental neurosurgery—and billions of people get caught up in those reward-hacks. If Shard Theory is serious about describing actual human behavior, it needs some way to describe both our taste for many kinds of reward-hacking, and our resistance to it.
GeneSmith—when people in AI alignment or LessWrong talk about ‘wireheading’, I understood that not to refer to people literally asking neurosurgeons to stick wires into their brains, but rather to a somewhat larger class of ways to hack one’s own reward systems through the usual perceptual input channels.
I agree that humans are not ‘reward maximizing agents’, whatever that is supposed to mean in reference to actual evolved organisms with diverse, heterogenous, & domain-specific motivational systems.
I don’t think I explained my thinking clearly enough.
If using wireheading to refer to the broader class of actions that increase reward at the cost of maximizing reproductive fitness, I agree humans in general do wirehead to some degree. But even if we count taking recreational drugs or masturbation as wireheading, I still don’t believe any other theory of values explains the relative lack of these behaviors as well as shard theory.
If humans were truly reward maximizers, it’s difficult to imagine how they would manage to avoid wireheading as well as they do. I suppose perhaps the “thousand genetic hacks” theory might be able to explain it if evolution was clever enough? There’s certainly some evidence that when humans are exposed to new sources of reward that do nothing to benefit reproductive fitness, it’s often a disaster. See the numerous cases of hunter-gatherer peoples being exposed to alcohol for the first time.
But again… think about the actual wireheading example. There must be millions of humans that know about wireheading, yet so far as I know there are zero examples of people doing so recreationally. There was nothing similar to wireheading in the ancestral environment. Yet nearly everyone seems aversive to the idea of literally wireheading themselves.
Why? Humans can anticipate how incredibly rewarding it would be to wirehead. And many humans could afford full time caretakers to ensure they would be able to experience the rewards for years or decades. So why are people aversive to the idea?
My interpretation is that humans develop their initial shards of value during childhood in an environment that usually contains limited opportunities to wirehead. As the human world model becomes generally filled in, it becomes the case that most sensory environments activate at least one shard, whose “values” are not aligned with wireheading.
I think shard theory has a better explanation of this relative lack of wireheading than alternative models. But it’s obviously incomplete without a description of the free parameters that can be tweaked by genetics to produce the distribution of behaviors we see in the human population.
This is why I am hopeful others will start working on establishing the mathematics of shard theory so that we can see if shards really do form in neural networks, and if so how they behave.
GeneSmith—I guess I’m still puzzled about how Shard Theory prevents wireheading (broadly construed); I just don’t see it as a magic bullet that can keep agents focused on their ultimate goals. I must be missing something.
And, insofar as Shard Theory is supposed to be an empirically accurate description of human agents, it would need to explain why some people become fentanyl addicts who might eventually overdose, and others don’t. Or why some people pursue credentials and careers at the cost of staying childless… while others settle down young, have six kids, and don’t worry as much about status-seeking. Or why some people take up free solo mountain climbing, for the rush, and fall to their deaths by age 30, whereas others are more risk-averse.
Modern consumerist capitalism offers thousands of ways to ‘wirehead’ our reward systems, that don’t require experimental neurosurgery—and billions of people get caught up in those reward-hacks. If Shard Theory is serious about describing actual human behavior, it needs some way to describe both our taste for many kinds of reward-hacking, and our resistance to it.