paulfchristiano comments on Do alignment concerns extend to powerful non-AI agents?

paulfchristiano 25 Jun 2022 17:48 UTC
7 points
The question “what does a human do if they obtain a lot of power” seems only tangentially related to intent alignment. I think this largely comes down to (i) the preferences of that human in this new context, (ii) the competence of that person at behaving sanely in this new context.
I like to think that I’m a nice person who the world should be unusually happy to empower, but I don’t think that means I’m “aligned with humanity;” in general we are aiming at a much stronger notion of alignment than that. Indeed, I don’t think “humanity” has the right type signature for something to be aligned with. And on top of that, while there are many entities I would treat with respect, and while I would expect to quickly devolve the power I acquired in this magical thought experiment, I still don’t think there exists any X (other than perhaps “Paul Christiano”) that I am aligned with in the sense that we want our AI systems to be aligned with us.