I mean, there’s a sense in which every aspect of developing AGI capabilities is “relevant for AGI safety”. If nothing else, for every last line of AGI source code, we can do an analysis of what happens if that line of code has a bug, or if a cosmic ray flips a bit, and how do we write good unit tests, etc.
So anyway, there’s a category of AGI safety work that we might call “Endgame Safety”, where we’re trying to do all the AGI safety work that we couldn’t or didn’t do ahead of time, in the very last moments before (or even after) people are actually playing around with the kind of powerful AGI algorithms that could get out of control and destroy the future.
My claims are:
If there’s any safety work that requires understanding the gory details of the brain’s learning algorithms, then that safety work in the category of “Endgame Safety”—because as soon as we understand those gory details, then we’re spitting distance from a world in which hundreds of actors around the world are able to build very powerful and dangerous superhuman AGIs. My argument for that claim is §3.7–§3.8 here. (Plus here for the “hundreds of actors” part.)
The following is a really bad argument: “Endgame Safety is really important, so let’s try to make the endgame happen ASAP, so that we can get to work on Endgame Safety.” It’s a bad argument because, What’s the rush? There’s going to be an endgame sooner or later, and we can do Endgame Safety Research then! Bringing the endgame sooner is basically equivalent to having all the AI alignment and strategy researchers hibernate for some number N years, and then wake up and get back to work. And that, in turn, is strictly worse than having all the AI alignment and strategy researchers do what they can during the next N years, and also continue doing work after those N years have elapsed. I claim that there is plenty of safety work that we can do right now that is not in the category of “Endgame Safety”, and in particular that posts #12–#15 are in that category (and they have lots more open questions!).
I don‘t need the gory details, but „the brain is doing some variant of gradient descent“ or „the brain is doing this crazy thing that doesn‘t seem to depend on local information in the loss landscape at all“ would seem like particularly valuable pieces of information to me, compared to other generic information about the AGI we have, for things I am working on right now.
I mean, there’s a sense in which every aspect of developing AGI capabilities is “relevant for AGI safety”. If nothing else, for every last line of AGI source code, we can do an analysis of what happens if that line of code has a bug, or if a cosmic ray flips a bit, and how do we write good unit tests, etc.
So anyway, there’s a category of AGI safety work that we might call “Endgame Safety”, where we’re trying to do all the AGI safety work that we couldn’t or didn’t do ahead of time, in the very last moments before (or even after) people are actually playing around with the kind of powerful AGI algorithms that could get out of control and destroy the future.
My claims are:
If there’s any safety work that requires understanding the gory details of the brain’s learning algorithms, then that safety work in the category of “Endgame Safety”—because as soon as we understand those gory details, then we’re spitting distance from a world in which hundreds of actors around the world are able to build very powerful and dangerous superhuman AGIs. My argument for that claim is §3.7–§3.8 here. (Plus here for the “hundreds of actors” part.)
The following is a really bad argument: “Endgame Safety is really important, so let’s try to make the endgame happen ASAP, so that we can get to work on Endgame Safety.” It’s a bad argument because, What’s the rush? There’s going to be an endgame sooner or later, and we can do Endgame Safety Research then! Bringing the endgame sooner is basically equivalent to having all the AI alignment and strategy researchers hibernate for some number N years, and then wake up and get back to work. And that, in turn, is strictly worse than having all the AI alignment and strategy researchers do what they can during the next N years, and also continue doing work after those N years have elapsed. I claim that there is plenty of safety work that we can do right now that is not in the category of “Endgame Safety”, and in particular that posts #12–#15 are in that category (and they have lots more open questions!).
I don‘t need the gory details, but „the brain is doing some variant of gradient descent“ or „the brain is doing this crazy thing that doesn‘t seem to depend on local information in the loss landscape at all“ would seem like particularly valuable pieces of information to me, compared to other generic information about the AGI we have, for things I am working on right now.