Steven Byrnes comments on My AGI safety research—2022 review, ’23 plans

Steven Byrnes Jan 12, 2024, 9:29 PM
LW: 3 AF: 3
0
AF
Thanks!
I don’t think “be nice to others in proportion to how similar to yourself they are” is part of it. For example, dogs can be nice to humans, and to goats, etc. I guess your response is ‘well dogs are a bit like humans and goats’. But are they? From the dog’s perspective? They look different, sound different, smell different, etc. I don’t think dogs really know what they are in the first place, at least not in that sense. Granted, we’re talking about humans not dogs. But humans can likewise feel compassion towards animals, especially cute ones (cf. “charismatic megafauna”). Do humans like elephants because elephants are kinda like humans? I mean, I guess elephants are more like humans than microbes are. But they’re still pretty different. I don’t think similarity per se is why humans care about elephants. I think it’s something about the elephants’ cute faces, and the cute way that they move around.
More specifically, my current vague guess is that the brainstem applies some innate heuristics to sensory inputs to guess things like “that thing there is probably a person”. This includes things like heuristics for eye-contact-detection and face-detection and maybe separately cute-face-detection etc. The brainstem also has heuristics that detect the way that spiders scuttle and snakes slither (for innate phobias). I think these heuristics are pretty simple; for example, the human brainstem face detector (in the superior colliculus) has been studied a bit, and the conclusion seems to be that it mostly just detects the presence of three dark ,blobs of about the right size, in an inverted triangle. (The superior colliculus is pretty low resolution.)
If we’re coding the AGI, we can design those sensory heuristics to trigger on whatever we want. Presumably we would just use a normal ConvNet image classifier for this. If we want the AGI to find cockroaches adorably “cute”, and kittens gross, I think that would be really straightforward to code up.
So I’m not currently worried about that exact thing. I do have a few kinda-related concerns though. For example, maybe adult social emotions can only develop after lots and lots of real-time conversations with real-world humans, and that’s a slow and expensive kind of training data for an AGI. Or maybe the development of adult social emotions is kinda a package deal, such that you can’t delete “the bad ones” (e.g. envy) from an AGI without messing everything else up.
(Part of the challenge is that false-positives, e.g. where the AGI feels compassion towards microbes or teddy bears or whatever, are a very big problem, just as false-negatives are.)