nostalgebraist comments on Humans provide an untapped wealth of evidence about alignment

nostalgebraist 23 Jul 2022 19:46 UTC
LW: 24 AF: 8
17
AF
I don’t have anything especially insightful to contribute, but I wanted to thank you (TurnTrout and Quinton) for this post. I agree with it, and I often find myself thinking things like this when I read alignment posts by others on LW/AF.
When people present frameworks for thinking about AGIs or generic “intelligent agents,” I often want to ask them: “are humans expressible in your framework?” Often it seems like the answer is “no.”
And a common symptom of this is that the framework cannot express entities with human-level capabilities that are as well aligned with other such agents are humans are with one another. Deception, for example, is much less of a problem for humans in practice than it is claimed to be for AGIs in theory. Yes, we do engage in it sometimes, but we could do it a lot more than (most of us) do. Since this state of affairs is possible, and since it’s desirable, it seems important to know how it can be achieved.