4. We can’t just “decide not to build AGI” because GPUs are everywhere, and knowledge of algorithms is constantly being improved and published; 2 years after the leading actor has the capability to destroy the world, 5 other actors will have the capability to destroy the world. The given lethal challenge is to solve within a time limit, driven by the dynamic in which, over time, increasingly weak actors with a smaller and smaller fraction of total computing power, become able to build AGI and destroy the world. Powerful actors all refraining in unison from doing the suicidal thing just delays this time limit—it does not lift it, unless computer hardware and computer software progress are both brought to complete severe halts across the whole Earth. The current state of this cooperation to have every big actor refrain from doing the stupid thing, is that at present some large actors with a lot of researchers and computing power are led by people who vocally disdain all talk of AGI safety (eg Facebook AI Research). Note that needing to solve AGI alignment only within a time limit, but with unlimited safe retries for rapid experimentation on the full-powered system; or only on the first critical try, but with an unlimited time bound; would both be terrifically humanity-threatening challenges by historical standards individually.
There’s a sleight of hand that is very tempting for some people here. Perhaps it’s not tempting for you but I’ve decided I get it enough for me to point it out. The sleight of hand is to take one or both of the following obvious truths:
We’re going to build AGI eventually.
Once we are in “crunch time”, where one actor has AGI, quadratically more actors will begin to possess it from software and hardware gains even if one abstains from destroying the world.
And then use that to fallaciously conclude:
Delaying the lethal challenge itself is impossible.
Or the more sophisticated but also wrong:
Attempts to slow down capabilities research means you have to slow down some particular AGI company or subset of AGI companies which represent the fastest/most careless/etc., and are logarithmically successful.
At present, AI research is not particularly siloed. Institutions like FAIR and OpenAI end up somehow sharing their most impactful underlying insights, about ML scaling or otherwise, with everybody. Everyone is piled into this big Arxiv-bound community where each actor is contributing to the capabilities of each other actor. So, if you can definitively prevent a software capability gain from being published which, on expectation, would have saved FAIR or whoever else ends up actually pressing the button a couple days, that’d be pretty sweet.
Perhaps it is impossible to do that effectively, and I’m just a 20yo too quick to stop heeding his elders’ examples, I don’t know. But when people disagree with me about capabilities research being bad, they usually make this mental mis-step where they conflate “preventing a single actor from pressing the button” and “slowing the Eldritch rising tide of software and hardware improvements in AI”. That or they think AGI isn’t gonna be bad, but I think it’s gonna be bad, so.
Simple but key insights in AI (e.g doing backprop, using sensible weight initialisation) have been missed for decades.
If the right tail for the time to AGI by a single group can be long and there aren’t that many groups, convincing one group to slow down / paying more attention to safety can have big effects.
How big of an effect? Years doesn’t seem off the table. Eliezer suggests 6 months dismissively. But add a couple years here and a couple years there, and pretty soon you’re talking about the possibility of real progress. It’s obviously of little use if no research towards alignment is attempted in that period of course, but it’s not nothing.
There’s a sleight of hand that is very tempting for some people here. Perhaps it’s not tempting for you but I’ve decided I get it enough for me to point it out. The sleight of hand is to take one or both of the following obvious truths:
We’re going to build AGI eventually.
Once we are in “crunch time”, where one actor has AGI, quadratically more actors will begin to possess it from software and hardware gains even if one abstains from destroying the world.
And then use that to fallaciously conclude:
Delaying the lethal challenge itself is impossible.
Or the more sophisticated but also wrong:
Attempts to slow down capabilities research means you have to slow down some particular AGI company or subset of AGI companies which represent the fastest/most careless/etc., and are logarithmically successful.
At present, AI research is not particularly siloed. Institutions like FAIR and OpenAI end up somehow sharing their most impactful underlying insights, about ML scaling or otherwise, with everybody. Everyone is piled into this big Arxiv-bound community where each actor is contributing to the capabilities of each other actor. So, if you can definitively prevent a software capability gain from being published which, on expectation, would have saved FAIR or whoever else ends up actually pressing the button a couple days, that’d be pretty sweet.
Perhaps it is impossible to do that effectively, and I’m just a 20yo too quick to stop heeding his elders’ examples, I don’t know. But when people disagree with me about capabilities research being bad, they usually make this mental mis-step where they conflate “preventing a single actor from pressing the button” and “slowing the Eldritch rising tide of software and hardware improvements in AI”. That or they think AGI isn’t gonna be bad, but I think it’s gonna be bad, so.
In addition
There aren’t that many actors in the lead.
Simple but key insights in AI (e.g doing backprop, using sensible weight initialisation) have been missed for decades.
If the right tail for the time to AGI by a single group can be long and there aren’t that many groups, convincing one group to slow down / paying more attention to safety can have big effects.
How big of an effect? Years doesn’t seem off the table. Eliezer suggests 6 months dismissively. But add a couple years here and a couple years there, and pretty soon you’re talking about the possibility of real progress. It’s obviously of little use if no research towards alignment is attempted in that period of course, but it’s not nothing.
It’s of use at least inasmuch as it increases my life expectancy.