paulfchristiano comments on Thoughts on Iason Gabriel’s Artificial Intelligence, Values, and Alignment

paulfchristiano 16 Jan 2021 2:17 UTC
LW: 17 AF: 12
AF
I basically agree that humans ought to use AI to get space, safety and time to figure out what we want and grow into the people we want to be before making important decisions. This is (roughly) why I’m not concerned with some of the distinctions Gabriel raises, or that naturally come to mind when many people think of alignment.
That said, I feel your analogy misses a key point: while the child is playing in their sandbox, other stuff is happening in the world—people are building factories and armies, fighting wars and grabbing resources in space, and so on—and the child will inherit nothing at all unless their parent fights for it.
So without (fairly extreme) coordination, we need to figure out how to have the parent acquire resources and then ultimately “give” those resources to the child. It feels like that problem shouldn’t be much harder than the parent acquiring resources for themselves (I explore this intuition some in this post on the “strategy stealing” assumption), so that this just comes down to whether we can create a parent who is competent while being motivated to even try to help the child. That’s what I have in mind while working on the alignment problem.
On the other hand, given strong enough coordination that the parent doesn’t have to fight for their child, I think that the whole shape of the alignment problem changes in more profound ways.
I think that much existing research on alignment, and my research in particular, is embedded in the “agency hand-off paradigm” only to the extent that is necessitated by that situation.
I do agree that my post on indirect normativity is embedded in a stronger version of the agency hand-off paradigm. I think the main reason for taking an approach like that is that a human embedded in the physical world is a soft target for would-be attackers and creates a. If we are happy handing off control to a hypothetical version of ourselves in the imagination of our AI, then we can achieve additional security by doing so, and this may be more appealing than other mechanisms to achieve a similar level of security (like uploading or retreating to a secure physical sanctuary). In some sense all of this is just about saying what it means to ultimately “give” the resources to the child, and it does so by trying to construct an ideal environment for them to become wiser after which they will be mature enough to provide more direct instructions. (But in practice I think that these proposals may involve a jarring transition that could be avoided by using a physical sanctuary instead or just ensuring that our local environments remain hospitable.)
Overall it feels to me like you are coming from a similar place to where I was when I wrote this post on corrigibility, and I’m curious if there are places where you would part ways with that perspective (given the consideration I raised in this comment).
(I do think “aligned with who?” is a real question since the parent needs to decide which child will ultimately get the resources, or else if there are multiple children playing together then it matters a lot how the parent’s decisions shape the environment that will ultimately aggregate their preferences.)