One of the most common proposals I see people raise (once they understand the core issues) is some form of, “can’t we just use some form of slightly-weaker safe AI to augment human capabilities and allow us to bootstrap to / monitor / understand the more advanced versions?” And in fact lots of AI safety agendas do propose something along these lines. How would you best explain to a newcomer why Eliezer and others think this will not work? How would you explain the key cruxes that make Eliezer et al think nothing along these lines will work, while others think it’s more promising?
The reason why nobody in this community has successfully named a ‘pivotal weak act’ where you do something weak enough with an AGI to be passively safe, but powerful enough to prevent any other AGI from destroying the world a year later—and yet also we can’t just go do that right now and need to wait on AI—is that nothing like that exists. There’s no reason why it should exist. There is not some elaborate clever reason why it exists but nobody can see it. It takes a lot of power to do something to the current world that prevents any other AGI from coming into existence; nothing which can do that is passively safe in virtue of its weakness. If you can’t solve the problem right now (which you can’t, because you’re opposed to other actors who don’t want to be solved and those actors are on roughly the same level as you) then you are resorting to some cognitive system that can do things you could not figure out how to do yourself, that you were not close to figuring out because you are not close to being able to, for example, burn all GPUs. Burning all GPUs would actually stop Facebook AI Research from destroying the world six months later; weaksauce Overton-abiding stuff about ‘improving public epistemology by setting GPT-4 loose on Twitter to provide scientifically literate arguments about everything’ will be cool but will not actually prevent Facebook AI Research from destroying the world six months later, or some eager open-source collaborative from destroying the world a year later if you manage to stop FAIR specifically. There are no pivotal weak acts.
One of the most common proposals I see people raise (once they understand the core issues) is some form of, “can’t we just use some form of slightly-weaker safe AI to augment human capabilities and allow us to bootstrap to / monitor / understand the more advanced versions?” And in fact lots of AI safety agendas do propose something along these lines. How would you best explain to a newcomer why Eliezer and others think this will not work? How would you explain the key cruxes that make Eliezer et al think nothing along these lines will work, while others think it’s more promising?
Eliezer’s argument from the recent post:
The reason why nobody in this community has successfully named a ‘pivotal weak act’ where you do something weak enough with an AGI to be passively safe, but powerful enough to prevent any other AGI from destroying the world a year later—and yet also we can’t just go do that right now and need to wait on AI—is that nothing like that exists. There’s no reason why it should exist. There is not some elaborate clever reason why it exists but nobody can see it. It takes a lot of power to do something to the current world that prevents any other AGI from coming into existence; nothing which can do that is passively safe in virtue of its weakness. If you can’t solve the problem right now (which you can’t, because you’re opposed to other actors who don’t want to be solved and those actors are on roughly the same level as you) then you are resorting to some cognitive system that can do things you could not figure out how to do yourself, that you were not close to figuring out because you are not close to being able to, for example, burn all GPUs. Burning all GPUs would actually stop Facebook AI Research from destroying the world six months later; weaksauce Overton-abiding stuff about ‘improving public epistemology by setting GPT-4 loose on Twitter to provide scientifically literate arguments about everything’ will be cool but will not actually prevent Facebook AI Research from destroying the world six months later, or some eager open-source collaborative from destroying the world a year later if you manage to stop FAIR specifically. There are no pivotal weak acts.