Thane Ruthenis comments on Principles of Privacy for Alignment Research

Thane Ruthenis 19 Aug 2022 12:42 UTC
13 points
11
Okay, I’ve thought about it more, and I think my concerns are mainly outlined by this. Less by the post’s actual contents, and more by the post’s existence.
People dislike villains. Whether the concerns Andrew outlines are valid or not, people on the outside will tend to think that such concerns are valid. The hypothetical unilateral-aligned-AGI organization will be, at all times, on the verge of being a target of the entire world. The public would rally against it if the organizations’ intentions became public knowledge, other AI Labs would be eager to get rid of the competition slash threat it presents, and governments would be eager either to seize AI research (if they take AI seriously by that point) or acquire political points by squishing something the public and megacorps want squished.
As such, the unilateral path requires a lot of subtle secrecy too. It should not be known that we expect our AI to engage in, uh, full-scale world… optimization. In theory, that connection can be left obscured — most of the people involved can just be allowed to fail to think about what the aligned superintelligence will do once it’s deployed, so there aren’t leaks from low-commitment people joining and quitting the org. But the people in charge will probably have the full picture, and… Well, at this point it sounds like the stupid kind of supervillain doomsday scheme, no?
More practically, I think the ship has already sailed on keeping the sort of secrecy this plan would need to work. I don’t understand why all this talk of pivotal acts has been allowed to enter public discourse by Eliezer et al., but it’ll be doubtlessly connected to any hypothetical future friendly-AGI org. Probably not by the public/other AI labs directly, but by fellow AI Safety researches who do not agree with unilateral pivotal acts. And once the concerns have been signal-boosted so, then they may be picked up by the media/politicians/Eliezer’s sneer club/whoever, and once we’re spending billions on training runs and it’s clear that there’s something actually going on beyond a bunch of doom-cult wackos, they will take these concerns seriously and act on them.
A further contributing factor may be increased public awareness of AI Risk in the future, encouraged by general AI capabilities growth, possible (non-omnicial) AI disasters, and poorly-considered efforts of our own community. (It would be very darkly ironic if AI Safety’s efforts to ban dangerous AI research resulted in governments banning AI Safety’s own AGI research and no-one else’s, so that’s probably an attractor in possibility-space because we live in Hell.)
The bottom line is… This idea seems thermonuclear, in the sense that trying it and getting noticed probably completely dooms us on the spot, and it’d be really hard not to get noticed.
(Though I don’t really buy the whole “pivotal processes” thing either. We can probably increase the timeline this way, but actually making the world’s default systems produce an aligned AI… Nah.)