johnswentworth comments on Principles of Privacy for Alignment Research

johnswentworth 29 Jul 2022 5:26 UTC
4 points
0
We do in fact have access to rather a lot of money; billions of dollars would not be out of the question in a few years, hundreds of millions are already probably available if we have something worthwhile to do with it, and alignment orgs are spending tens of millions already. Though by the time it becomes relevant, I don’t particularly expect today’s dollars → compute → performance curves to apply very well anyway.
But what if we have to release a flashy demo to attract attention, so there are now people swarming the already-published research looking for ideas?
Also money is a great substitute for attracting attention.
- Thane Ruthenis 19 Aug 2022 12:42 UTC
  13 points
  11
  Parent
  Okay, I’ve thought about it more, and I think my concerns are mainly outlined by this. Less by the post’s actual contents, and more by the post’s existence.
  People dislike villains. Whether the concerns Andrew outlines are valid or not, people on the outside will tend to think that such concerns are valid. The hypothetical unilateral-aligned-AGI organization will be, at all times, on the verge of being a target of the entire world. The public would rally against it if the organizations’ intentions became public knowledge, other AI Labs would be eager to get rid of the competition slash threat it presents, and governments would be eager either to seize AI research (if they take AI seriously by that point) or acquire political points by squishing something the public and megacorps want squished.
  As such, the unilateral path requires a lot of subtle secrecy too. It should not be known that we expect our AI to engage in, uh, full-scale world… optimization. In theory, that connection can be left obscured — most of the people involved can just be allowed to fail to think about what the aligned superintelligence will do once it’s deployed, so there aren’t leaks from low-commitment people joining and quitting the org. But the people in charge will probably have the full picture, and… Well, at this point it sounds like the stupid kind of supervillain doomsday scheme, no?
  More practically, I think the ship has already sailed on keeping the sort of secrecy this plan would need to work. I don’t understand why all this talk of pivotal acts has been allowed to enter public discourse by Eliezer et al., but it’ll be doubtlessly connected to any hypothetical future friendly-AGI org. Probably not by the public/other AI labs directly, but by fellow AI Safety researches who do not agree with unilateral pivotal acts. And once the concerns have been signal-boosted so, then they may be picked up by the media/politicians/Eliezer’s sneer club/whoever, and once we’re spending billions on training runs and it’s clear that there’s something actually going on beyond a bunch of doom-cult wackos, they will take these concerns seriously and act on them.
  A further contributing factor may be increased public awareness of AI Risk in the future, encouraged by general AI capabilities growth, possible (non-omnicial) AI disasters, and poorly-considered efforts of our own community. (It would be very darkly ironic if AI Safety’s efforts to ban dangerous AI research resulted in governments banning AI Safety’s own AGI research and no-one else’s, so that’s probably an attractor in possibility-space because we live in Hell.)
  The bottom line is… This idea seems thermonuclear, in the sense that trying it and getting noticed probably completely dooms us on the spot, and it’d be really hard not to get noticed.
  (Though I don’t really buy the whole “pivotal processes” thing either. We can probably increase the timeline this way, but actually making the world’s default systems produce an aligned AI… Nah.)
- Thane Ruthenis 29 Jul 2022 5:42 UTC
  4 points
  1
  Parent
  Fair. I have no more concrete counter-arguments to offer at this time.
  I still have a vague sense that acting with the expectations that we’d be able to unilaterally build an AGI is optimistic in a way that dooms us in a nontrivial number of timelines that would’ve been salvageable if we didn’t assume that. But maybe that impression is wrong.