One part I disagree with: I do not expect that implementing an alignment solution will involve influencing government/labs, conditional on having an alignment solution at all. Reason: alignment requires understanding basically-all the core pieces of intelligence at a sufficiently-detailed level that any team capable of doing it will be very easily capable of building AGI. It is wildly unlikely that a team not capable of building AGI is even remotely capable of solving alignment.
Another part I disagree with: I claim that, if I publish 95% of the insights needed for X, then the average time before somebody besides me or my immediate friends/coworkers implements X goes down by, like, maybe 10%. Even if I publish 100% of the insights, the average time before somebody besides me or my immediate friends/coworkers implements X only goes down by maybe 20%, if I don’t publish any flashy demos.
A concrete example to drive that intuition: imagine a software library which will do something very useful once complete. If the library is 95% complete, nobody uses it, and it’s pretty likely that someone looking to implement the functionality will just start from scratch. Even if the library is 100% complete, without a flashy demo few people will ever find it.
All that said, there is a core to your argument which I do buy. The worlds where our work is useful at all for alignment are also the worlds where our work is most likely to be capabilities relevant. So, I’m most likely to end up regretting publishing something in exactly those worlds where the thing is useful for alignment; I’m making my life harder in exactly those worlds where I might otherwise have succeeded.
I do not expect that implementing an alignment solution will involve influencing government/labs, conditional on having an alignment solution at all
Mmm, right, in this case the fact that the rest of the AI industry is being carefree about openly publishing WMD design schematics is actually beneficial to us — our hypothetical AGI group won’t be missing many insights that other industry leaders have.
The two bottlenecks here that I still see are money and manpower. The theory for solving alignment and the theory for designing AGI are closely related, but the practical implementations of these two projects may be sufficiently disjoint — such that the optimal setup is e. g. one team works full-time on developing universal interpretability tools while another works full-time on AGI architecture design. If we could hand off the latter part to skilled AI architects (and not expect them to screw it up), that may be a nontrivial speed boost.
Separately, there’s the question of training sets/compute, i. e. money. Do we have enough of it? Suppose in a decade or two, one of the leading AI Labs successfully pushes for a Manhattan project equivalent, such that they’d be able to blow billions of dollars on training runs. Sure, insights into agency will probably make our AGI less compute-hungry. But will it be cheaper enough that we’d be able to match this?
Even if the library is 100% complete, without a flashy demo few people will ever find it.
But what if we have to release a flashy demo to attract attention, so there are now people swarming the already-published research looking for ideas?
We do in fact have access to rather a lot of money; billions of dollars would not be out of the question in a few years, hundreds of millions are already probably available if we have something worthwhile to do with it, and alignment orgs are spending tens of millions already. Though by the time it becomes relevant, I don’t particularly expect today’s dollars → compute → performance curves to apply very well anyway.
But what if we have to release a flashy demo to attract attention, so there are now people swarming the already-published research looking for ideas?
Also money is a great substitute for attracting attention.
Okay, I’ve thought about it more, and I think my concerns are mainly outlined by this. Less by the post’s actual contents, and more by the post’s existence.
People dislike villains. Whether the concerns Andrew outlines are valid or not, people on the outside will tend to think that such concerns are valid. The hypothetical unilateral-aligned-AGI organization will be, at all times, on the verge of being a target of the entire world. The public would rally against it if the organizations’ intentions became public knowledge, other AI Labs would be eager to get rid of the competition slash threat it presents, and governments would be eager either to seize AI research (if they take AI seriously by that point) or acquire political points by squishing something the public and megacorps want squished.
As such, the unilateral path requires a lot of subtle secrecy too. It should not be known that we expect our AI to engage in, uh, full-scale world… optimization. In theory, that connection can be left obscured — most of the people involved can just be allowed to fail to think about what the aligned superintelligence will do once it’s deployed, so there aren’t leaks from low-commitment people joining and quitting the org. But the people in charge will probably have the full picture, and… Well, at this point it sounds like the stupid kind of supervillain doomsday scheme, no?
More practically, I think the ship has already sailed on keeping the sort of secrecy this plan would need to work. I don’t understand why all this talk of pivotal acts has been allowed to enter public discourse by Eliezer et al., but it’ll be doubtlessly connected to any hypothetical future friendly-AGI org. Probably not by the public/other AI labs directly, but by fellow AI Safety researches who do not agree with unilateral pivotal acts. And once the concerns have been signal-boosted so, then they may be picked up by the media/politicians/Eliezer’s sneer club/whoever, and once we’re spending billions on training runs and it’s clear that there’s something actually going on beyond a bunch of doom-cult wackos, they will take these concerns seriously and act on them.
A further contributing factor may be increased public awareness of AI Risk in the future, encouraged by general AI capabilities growth, possible (non-omnicial) AI disasters, and poorly-considered efforts of our own community. (It would be very darkly ironic if AI Safety’s efforts to ban dangerous AI research resulted in governments banning AI Safety’s own AGI research and no-one else’s, so that’s probably an attractor in possibility-space because we live in Hell.)
The bottom line is… This idea seems thermonuclear, in the sense that trying it and getting noticed probably completely dooms us on the spot, and it’d be really hard not to get noticed.
(Though I don’t really buy the whole “pivotal processes” thing either. We can probably increase the timeline this way, but actually making the world’s default systems produce an aligned AI… Nah.)
Fair. I have no more concrete counter-arguments to offer at this time.
I still have a vague sense that acting with the expectations that we’d be able to unilaterally build an AGI is optimistic in a way that dooms us in a nontrivial number of timelines that would’ve been salvageable if we didn’t assume that. But maybe that impression is wrong.
One part I disagree with: I do not expect that implementing an alignment solution will involve influencing government/labs, conditional on having an alignment solution at all. Reason: alignment requires understanding basically-all the core pieces of intelligence at a sufficiently-detailed level that any team capable of doing it will be very easily capable of building AGI. It is wildly unlikely that a team not capable of building AGI is even remotely capable of solving alignment.
Another part I disagree with: I claim that, if I publish 95% of the insights needed for X, then the average time before somebody besides me or my immediate friends/coworkers implements X goes down by, like, maybe 10%. Even if I publish 100% of the insights, the average time before somebody besides me or my immediate friends/coworkers implements X only goes down by maybe 20%, if I don’t publish any flashy demos.
A concrete example to drive that intuition: imagine a software library which will do something very useful once complete. If the library is 95% complete, nobody uses it, and it’s pretty likely that someone looking to implement the functionality will just start from scratch. Even if the library is 100% complete, without a flashy demo few people will ever find it.
All that said, there is a core to your argument which I do buy. The worlds where our work is useful at all for alignment are also the worlds where our work is most likely to be capabilities relevant. So, I’m most likely to end up regretting publishing something in exactly those worlds where the thing is useful for alignment; I’m making my life harder in exactly those worlds where I might otherwise have succeeded.
Mmm, right, in this case the fact that the rest of the AI industry is being carefree about openly publishing WMD design schematics is actually beneficial to us — our hypothetical AGI group won’t be missing many insights that other industry leaders have.
The two bottlenecks here that I still see are money and manpower. The theory for solving alignment and the theory for designing AGI are closely related, but the practical implementations of these two projects may be sufficiently disjoint — such that the optimal setup is e. g. one team works full-time on developing universal interpretability tools while another works full-time on AGI architecture design. If we could hand off the latter part to skilled AI architects (and not expect them to screw it up), that may be a nontrivial speed boost.
Separately, there’s the question of training sets/compute, i. e. money. Do we have enough of it? Suppose in a decade or two, one of the leading AI Labs successfully pushes for a Manhattan project equivalent, such that they’d be able to blow billions of dollars on training runs. Sure, insights into agency will probably make our AGI less compute-hungry. But will it be cheaper enough that we’d be able to match this?
But what if we have to release a flashy demo to attract attention, so there are now people swarming the already-published research looking for ideas?
We do in fact have access to rather a lot of money; billions of dollars would not be out of the question in a few years, hundreds of millions are already probably available if we have something worthwhile to do with it, and alignment orgs are spending tens of millions already. Though by the time it becomes relevant, I don’t particularly expect today’s dollars → compute → performance curves to apply very well anyway.
Also money is a great substitute for attracting attention.
Okay, I’ve thought about it more, and I think my concerns are mainly outlined by this. Less by the post’s actual contents, and more by the post’s existence.
People dislike villains. Whether the concerns Andrew outlines are valid or not, people on the outside will tend to think that such concerns are valid. The hypothetical unilateral-aligned-AGI organization will be, at all times, on the verge of being a target of the entire world. The public would rally against it if the organizations’ intentions became public knowledge, other AI Labs would be eager to get rid of the competition slash threat it presents, and governments would be eager either to seize AI research (if they take AI seriously by that point) or acquire political points by squishing something the public and megacorps want squished.
As such, the unilateral path requires a lot of subtle secrecy too. It should not be known that we expect our AI to engage in, uh, full-scale world… optimization. In theory, that connection can be left obscured — most of the people involved can just be allowed to fail to think about what the aligned superintelligence will do once it’s deployed, so there aren’t leaks from low-commitment people joining and quitting the org. But the people in charge will probably have the full picture, and… Well, at this point it sounds like the stupid kind of supervillain doomsday scheme, no?
More practically, I think the ship has already sailed on keeping the sort of secrecy this plan would need to work. I don’t understand why all this talk of pivotal acts has been allowed to enter public discourse by Eliezer et al., but it’ll be doubtlessly connected to any hypothetical future friendly-AGI org. Probably not by the public/other AI labs directly, but by fellow AI Safety researches who do not agree with unilateral pivotal acts. And once the concerns have been signal-boosted so, then they may be picked up by the media/politicians/Eliezer’s sneer club/whoever, and once we’re spending billions on training runs and it’s clear that there’s something actually going on beyond a bunch of doom-cult wackos, they will take these concerns seriously and act on them.
A further contributing factor may be increased public awareness of AI Risk in the future, encouraged by general AI capabilities growth, possible (non-omnicial) AI disasters, and poorly-considered efforts of our own community. (It would be very darkly ironic if AI Safety’s efforts to ban dangerous AI research resulted in governments banning AI Safety’s own AGI research and no-one else’s, so that’s probably an attractor in possibility-space because we live in Hell.)
The bottom line is… This idea seems thermonuclear, in the sense that trying it and getting noticed probably completely dooms us on the spot, and it’d be really hard not to get noticed.
(Though I don’t really buy the whole “pivotal processes” thing either. We can probably increase the timeline this way, but actually making the world’s default systems produce an aligned AI… Nah.)
Fair. I have no more concrete counter-arguments to offer at this time.
I still have a vague sense that acting with the expectations that we’d be able to unilaterally build an AGI is optimistic in a way that dooms us in a nontrivial number of timelines that would’ve been salvageable if we didn’t assume that. But maybe that impression is wrong.