This is drifting a bit far afield from the neurobio aspect of this research, but do you have an opinion about the likelihood that a randomly sampled human, if endowed with truly superhuman powers, would utilize those powers in a way that we’d be pleased to see from an AGI?
It seems to me like we have many salient examples of power corrupting, and absolute power corrupting to a great degree. Understanding that there’s a distribution of outcomes, do you have an opinion about the likelihood of benevolent use of great power, among humans?
This is not to say that this understanding can’t still be usefully employed, but somehow it seems like a relevant question. E.g. if it turns out that most of what keeps humans acting pro-socially is the fear that anti-social behavior will trigger their punishment by others, that’s likely not as juicy a mechanism since it may be hard to convince a comparatively omniscient and omnipotent being that it will somehow suffer if it does anti-social things.
For what it’s worth, Eliezer in 2018 said that he’d be pretty happy with endowing some specific humans with superhuman powers:
If the subject is Paul Christiano, or Carl Shulman, I for one am willing to say these humans are reasonably aligned; and I’m pretty much okay with somebody giving them the keys to the universe in expectation that the keys will later be handed back.
I’ve shown the above quote to a lot of people who say “yes that’s perfectly obvious”, and I’ve also shown this quote to a lot of people who say “Eliezer is being insufficiently cynical; absolute power corrupts absolutely”. For my part, I don’t have a strong opinion, but on my models, if we know how to make virtual humans, then we probably know how to make virtual humans without envy and without status drive and without teenage angst etc., which should help somewhat. More discussion here.
Thanks for the thoughtful reply. I read the fuller discussion you linked to and came away with one big question which I didn’t find addressed anywhere (though it’s possible I just missed it!)
Looking at the human social instinct, we see that it indeed steers us towards not wanting to harm other humans, but it weakens when extended to other creatures, somewhat in proportion to their difference from humans. We (generally) have lots of empathy for other humans, less so for apes, less so for other mammals (who we factory farm by the billions without most people particularly minding it) probably less so for octopi (who are bright but quite different) and almost none to the zillion microorganisms, some of which we allegedly evolved from. I would guess that even canonical Good Person Paul Christiano probably doesn’t lose much sleep over his impact on microorganisms.
This raises the question of whether the social instinct we have, even if fully reverse engineered, can be deployed separately from the identity of the entity to which it is attached. In other words, if the social instinct circuitry humans have is “be nice to others in proportion to how similar to yourself they are”, which seems to match the data, then we would need more than just the ability to place that circuitry in AGIs (which would presumably make the AGIs want to be nice to other similar AGIs). We would in fact need to be able to tease apart the object of empathy, and replace it with something that is very different than how humans operate, since no human is nice to microorganisms, so I see no evidence that the existing social instincts ever make any person be nice to something very different, and much weaker, than them, so I would expect it to work similarly in an AGI.
This is speculative, but it seems reasonably likely to me to turn out to be an actual problem. Curious if you have thoughts on it.
I don’t think “be nice to others in proportion to how similar to yourself they are” is part of it. For example, dogs can be nice to humans, and to goats, etc. I guess your response is ‘well dogs are a bit like humans and goats’. But are they? From the dog’s perspective? They look different, sound different, smell different, etc. I don’t think dogs really know what they are in the first place, at least not in that sense. Granted, we’re talking about humans not dogs. But humans can likewise feel compassion towards animals, especially cute ones (cf. “charismatic megafauna”). Do humans like elephants because elephants are kinda like humans? I mean, I guess elephants are more like humans than microbes are. But they’re still pretty different. I don’t think similarity per se is why humans care about elephants. I think it’s something about the elephants’ cute faces, and the cute way that they move around.
More specifically, my current vague guess is that the brainstem applies some innate heuristics to sensory inputs to guess things like “that thing there is probably a person”. This includes things like heuristics for eye-contact-detection and face-detection and maybe separately cute-face-detection etc. The brainstem also has heuristics that detect the way that spiders scuttle and snakes slither (for innate phobias). I think these heuristics are pretty simple; for example, the human brainstem face detector (in the superior colliculus) has been studied a bit, and the conclusion seems to be that it mostly just detects the presence of three dark ,blobs of about the right size, in an inverted triangle. (The superior colliculus is pretty low resolution.)
If we’re coding the AGI, we can design those sensory heuristics to trigger on whatever we want. Presumably we would just use a normal ConvNet image classifier for this. If we want the AGI to find cockroaches adorably “cute”, and kittens gross, I think that would be really straightforward to code up.
So I’m not currently worried about that exact thing. I do have a few kinda-related concerns though. For example, maybe adult social emotions can only develop after lots and lots of real-time conversations with real-world humans, and that’s a slow and expensive kind of training data for an AGI. Or maybe the development of adult social emotions is kinda a package deal, such that you can’t delete “the bad ones” (e.g. envy) from an AGI without messing everything else up.
(Part of the challenge is that false-positives, e.g. where the AGI feels compassion towards microbes or teddy bears or whatever, are a very big problem, just as false-negatives are.)
This is drifting a bit far afield from the neurobio aspect of this research, but do you have an opinion about the likelihood that a randomly sampled human, if endowed with truly superhuman powers, would utilize those powers in a way that we’d be pleased to see from an AGI?
It seems to me like we have many salient examples of power corrupting, and absolute power corrupting to a great degree. Understanding that there’s a distribution of outcomes, do you have an opinion about the likelihood of benevolent use of great power, among humans?
This is not to say that this understanding can’t still be usefully employed, but somehow it seems like a relevant question. E.g. if it turns out that most of what keeps humans acting pro-socially is the fear that anti-social behavior will trigger their punishment by others, that’s likely not as juicy a mechanism since it may be hard to convince a comparatively omniscient and omnipotent being that it will somehow suffer if it does anti-social things.
(lightly edited from my old comment here)
For what it’s worth, Eliezer in 2018 said that he’d be pretty happy with endowing some specific humans with superhuman powers:
I’ve shown the above quote to a lot of people who say “yes that’s perfectly obvious”, and I’ve also shown this quote to a lot of people who say “Eliezer is being insufficiently cynical; absolute power corrupts absolutely”. For my part, I don’t have a strong opinion, but on my models, if we know how to make virtual humans, then we probably know how to make virtual humans without envy and without status drive and without teenage angst etc., which should help somewhat. More discussion here.
Thanks for the thoughtful reply. I read the fuller discussion you linked to and came away with one big question which I didn’t find addressed anywhere (though it’s possible I just missed it!)
Looking at the human social instinct, we see that it indeed steers us towards not wanting to harm other humans, but it weakens when extended to other creatures, somewhat in proportion to their difference from humans. We (generally) have lots of empathy for other humans, less so for apes, less so for other mammals (who we factory farm by the billions without most people particularly minding it) probably less so for octopi (who are bright but quite different) and almost none to the zillion microorganisms, some of which we allegedly evolved from. I would guess that even canonical Good Person Paul Christiano probably doesn’t lose much sleep over his impact on microorganisms.
This raises the question of whether the social instinct we have, even if fully reverse engineered, can be deployed separately from the identity of the entity to which it is attached. In other words, if the social instinct circuitry humans have is “be nice to others in proportion to how similar to yourself they are”, which seems to match the data, then we would need more than just the ability to place that circuitry in AGIs (which would presumably make the AGIs want to be nice to other similar AGIs). We would in fact need to be able to tease apart the object of empathy, and replace it with something that is very different than how humans operate, since no human is nice to microorganisms, so I see no evidence that the existing social instincts ever make any person be nice to something very different, and much weaker, than them, so I would expect it to work similarly in an AGI.
This is speculative, but it seems reasonably likely to me to turn out to be an actual problem. Curious if you have thoughts on it.
Thanks!
I don’t think “be nice to others in proportion to how similar to yourself they are” is part of it. For example, dogs can be nice to humans, and to goats, etc. I guess your response is ‘well dogs are a bit like humans and goats’. But are they? From the dog’s perspective? They look different, sound different, smell different, etc. I don’t think dogs really know what they are in the first place, at least not in that sense. Granted, we’re talking about humans not dogs. But humans can likewise feel compassion towards animals, especially cute ones (cf. “charismatic megafauna”). Do humans like elephants because elephants are kinda like humans? I mean, I guess elephants are more like humans than microbes are. But they’re still pretty different. I don’t think similarity per se is why humans care about elephants. I think it’s something about the elephants’ cute faces, and the cute way that they move around.
More specifically, my current vague guess is that the brainstem applies some innate heuristics to sensory inputs to guess things like “that thing there is probably a person”. This includes things like heuristics for eye-contact-detection and face-detection and maybe separately cute-face-detection etc. The brainstem also has heuristics that detect the way that spiders scuttle and snakes slither (for innate phobias). I think these heuristics are pretty simple; for example, the human brainstem face detector (in the superior colliculus) has been studied a bit, and the conclusion seems to be that it mostly just detects the presence of three dark ,blobs of about the right size, in an inverted triangle. (The superior colliculus is pretty low resolution.)
If we’re coding the AGI, we can design those sensory heuristics to trigger on whatever we want. Presumably we would just use a normal ConvNet image classifier for this. If we want the AGI to find cockroaches adorably “cute”, and kittens gross, I think that would be really straightforward to code up.
So I’m not currently worried about that exact thing. I do have a few kinda-related concerns though. For example, maybe adult social emotions can only develop after lots and lots of real-time conversations with real-world humans, and that’s a slow and expensive kind of training data for an AGI. Or maybe the development of adult social emotions is kinda a package deal, such that you can’t delete “the bad ones” (e.g. envy) from an AGI without messing everything else up.
(Part of the challenge is that false-positives, e.g. where the AGI feels compassion towards microbes or teddy bears or whatever, are a very big problem, just as false-negatives are.)