Steven Byrnes comments on Niceness is unnatural

Steven Byrnes 13 Oct 2022 3:22 UTC
LW: 39 AF: 21
21
AF
I suspect I’m one of the people that caused Steven to write up his quick notes on mirror-neurons, because I was trying to make this point to him, and I think he misunderstood me as saying something stupid about mirror neurons.
Nope, I don’t remember you ever saying or writing anything stupid (or anything at all) about mirror neurons. That post was not in response to anything in particular and has no hidden agenda. :-)
…our ancestral environment…
I strongly agree that it’s a bad idea to try to get nice AGIs by doing a blind evolution-like outer-loop search process in an environment where multiple AGIs might benefit from cooperation—see Section 8.3.3.1 here for my three reasons why (which seem complementary to yours).
However, I don’t think that blind evolution-like outer-loop search processes are an ingredient in either shard theory or “alignment by default”.
At least in the shard theory case, the shard theory people seem very clear that when they talk about humans, they’re thinking about within-lifetime learning, not human evolution. For example, they have a post that says “Evolution is a bad analogy for AGI” right in the title!! (I agree btw.)
Expanding on 3):…
OK, now it seems that the post is maybe shifting away from evolution and towards within-lifetime learning, which I like.
In that case, I think there are innate drives that lead (non-psychopathic) humans to feel various social instincts, some of which are related to “niceness”. I think it would be valuable to understand exactly how these innate drives work, and that’s why I’ve been spending 80% of my time doing that. There are a few reasons that it seems valuable. At the very least, this information would give us examples to ground the yet-to-be-invented science that (we hope) will issue predictions like “If an AGI has innate drives X, and training environment Y, it will “grow up” into a trained AGI that wants to do Z”.
A stronger claim (which I don’t endorse) would be “We should put those exact same niceness-related innate drives, built the exact same way, into an AGI, and then we’ve solved alignment!” That seems like almost definitely a very bad plan to me. (See here.) The thing about empathy that you mentioned is one reason. Likewise, for all I know right now, the innate drives are implemented in a way that depends on having a human body and growing up at human speed in a human family etc.
However, if we understand how those innate drives work in humans, then we don’t have to slavishly copy them into an AGI. We can tailor them. Or we can come up with superficially-quite-different approaches that wind up in a similar place. Alignment-by-default would be in that “superficially quite different” category, I think? (As for shard theory, I’m a bit hazy on exactly what the plan is.)
Expanding on 4):
I want to register strong agreement that this is an area where things can go awry.
What links here?
- Niceness is unnatural by So8res (13 Oct 2022 1:30 UTC; 125 points)
- Niceness is unnatural by So8res (EA Forum; 13 Oct 2022 1:30 UTC; 20 points)