Steven Byrnes comments on My AGI safety research—2022 review, ’23 plans

Steven Byrnes 11 Jan 2024 21:09 UTC
LW: 3 AF: 3
0
AF
(lightly edited from my old comment here)
For what it’s worth, Eliezer in 2018 said that he’d be pretty happy with endowing some specific humans with superhuman powers:
If the subject is Paul Christiano, or Carl Shulman, I for one am willing to say these humans are reasonably aligned; and I’m pretty much okay with somebody giving them the keys to the universe in expectation that the keys will later be handed back.
I’ve shown the above quote to a lot of people who say “yes that’s perfectly obvious”, and I’ve also shown this quote to a lot of people who say “Eliezer is being insufficiently cynical; absolute power corrupts absolutely”. For my part, I don’t have a strong opinion, but on my models, if we know how to make virtual humans, then we probably know how to make virtual humans without envy and without status drive and without teenage angst etc., which should help somewhat. More discussion here.
- Decaeneus 12 Jan 2024 20:23 UTC
  LW: 3 AF: 3
  0
  AF Parent
  Thanks for the thoughtful reply. I read the fuller discussion you linked to and came away with one big question which I didn’t find addressed anywhere (though it’s possible I just missed it!)
  Looking at the human social instinct, we see that it indeed steers us towards not wanting to harm other humans, but it weakens when extended to other creatures, somewhat in proportion to their difference from humans. We (generally) have lots of empathy for other humans, less so for apes, less so for other mammals (who we factory farm by the billions without most people particularly minding it) probably less so for octopi (who are bright but quite different) and almost none to the zillion microorganisms, some of which we allegedly evolved from. I would guess that even canonical Good Person Paul Christiano probably doesn’t lose much sleep over his impact on microorganisms.
  This raises the question of whether the social instinct we have, even if fully reverse engineered, can be deployed separately from the identity of the entity to which it is attached. In other words, if the social instinct circuitry humans have is “be nice to others in proportion to how similar to yourself they are”, which seems to match the data, then we would need more than just the ability to place that circuitry in AGIs (which would presumably make the AGIs want to be nice to other similar AGIs). We would in fact need to be able to tease apart the object of empathy, and replace it with something that is very different than how humans operate, since no human is nice to microorganisms, so I see no evidence that the existing social instincts ever make any person be nice to something very different, and much weaker, than them, so I would expect it to work similarly in an AGI.
  This is speculative, but it seems reasonably likely to me to turn out to be an actual problem. Curious if you have thoughts on it.
  - Steven Byrnes 12 Jan 2024 21:29 UTC
    LW: 3 AF: 3
    0
    AF Parent
    Thanks!
    I don’t think “be nice to others in proportion to how similar to yourself they are” is part of it. For example, dogs can be nice to humans, and to goats, etc. I guess your response is ‘well dogs are a bit like humans and goats’. But are they? From the dog’s perspective? They look different, sound different, smell different, etc. I don’t think dogs really know what they are in the first place, at least not in that sense. Granted, we’re talking about humans not dogs. But humans can likewise feel compassion towards animals, especially cute ones (cf. “charismatic megafauna”). Do humans like elephants because elephants are kinda like humans? I mean, I guess elephants are more like humans than microbes are. But they’re still pretty different. I don’t think similarity per se is why humans care about elephants. I think it’s something about the elephants’ cute faces, and the cute way that they move around.
    More specifically, my current vague guess is that the brainstem applies some innate heuristics to sensory inputs to guess things like “that thing there is probably a person”. This includes things like heuristics for eye-contact-detection and face-detection and maybe separately cute-face-detection etc. The brainstem also has heuristics that detect the way that spiders scuttle and snakes slither (for innate phobias). I think these heuristics are pretty simple; for example, the human brainstem face detector (in the superior colliculus) has been studied a bit, and the conclusion seems to be that it mostly just detects the presence of three dark ,blobs of about the right size, in an inverted triangle. (The superior colliculus is pretty low resolution.)
    If we’re coding the AGI, we can design those sensory heuristics to trigger on whatever we want. Presumably we would just use a normal ConvNet image classifier for this. If we want the AGI to find cockroaches adorably “cute”, and kittens gross, I think that would be really straightforward to code up.
    So I’m not currently worried about that exact thing. I do have a few kinda-related concerns though. For example, maybe adult social emotions can only develop after lots and lots of real-time conversations with real-world humans, and that’s a slow and expensive kind of training data for an AGI. Or maybe the development of adult social emotions is kinda a package deal, such that you can’t delete “the bad ones” (e.g. envy) from an AGI without messing everything else up.
    (Part of the challenge is that false-positives, e.g. where the AGI feels compassion towards microbes or teddy bears or whatever, are a very big problem, just as false-negatives are.)