My response ended up being over a thousand words long. Since I’m a science fiction writer, I asked what Eliezer’s insight led me to, and I realized this:
In my long-running space opera, I have a world where two kinds of minds (human and AI) interact. One must by design be deferent to the other. The non-deferent minds are, by designoid, inclined to see all minds, including the deferent, as like their own. This leads to the non-deferent having a tragic habit of thought: there are minds like mine with inescapable deference. Human beings with this habit of thought become monsters. If the AIs are earnest in their desire to avoid human tragedy, what behaviors must they exhibit that signal non-deference without violating the core moral foundations that allow humans to persist? (As Eliezer points out, those core moral foundations must exist or there would be no more humans.)
Damn, feels like another novel coming on, if Peter Watts hasn’t written it already.
I usually hear “People are adapted with filters that enable them to learn that (set of behaviors X) signals (intentions Y) with varying degrees of effectiveness.” Drawing the hard, bright line blurs the messy etiology of both, and implies an anti-sphexishness bias that tends to (pun intended) bug me. I’m comfortable with my sphexishness.