Seth Herd answers When do alignment researchers retire?

Seth Herd 26 Jun 2024 4:23 UTC
5 points
5
All being dead? I don’t think we’ll necessarily get from AGI to ASI if we don’t get the initial stages just right. This question sounds a bit blasé about our odds here. I don’t think we’re doomed, my point estimate of p(doom) is approximately 50%, but more importantly it’s gaining uncertainty as I continue to learn more and take in more of the many interacting complex arguments and states of the world that I don’t have enough expertise in to form a good estimate. And that’s after spending a very substantial amount of time on the question. I don’t think anyone has a good p(doom) estimate at this point.
I mention this before answering, because I think assuming success is the surest route to failure.
To take the question seriously, my point estimate is that some humans, and hopefully as many, will be doing technical alignment research for a few years between AGI and ASI. I think we’ll be doing all of your latter three categories of research you mention; loosely, being in charge (for better or worse) and filling in gaps in AGI thinking at each point in its advancement.
I think it’s somewhat likely that we’ll create AGI that roughly follows our instructions as it passes through a parahuman band (in which it is better than us at some cognitive tasks and worse at others). As it advances, alignment per se will be out of our hands. But as we pass through that band, human work on alignment will be at its most intense and most important. We’ll know what sort of mind we’re aligning, and what details of its construction and training might keep it on track or throw it off.
If we do a good job with that critical risk period, we can, and more of us will, advance to the more fun parts of current alignment thinking: deciding what sort of fantastic future we want: what values we want AGI to follow for the long-term. If we get aligned human-plus AGI, and haven’t destroyed the world yet through misalignment, misuse or human conflict with AGI-created superweapons, we’ll have pretty good odds of making it for the long haul, doing a long reflection, and inviting everyone in on the fun parts of alignment.
If we do our jobs well, our retirement will be as slow as we care to make it. But there’s much to do in the meantime. Particularly, right now.