My friend Devin Kalish recently convinced me, at least to some extent, to focus on A.I. governance in the short-to-medium term, more than technical A.I. safety.
The argument that persuaded me was this:
A key point of stress among AI doomsayers like Yudkowsky is that we, first of all, have too little time, and second of all, have no way to implement any of the progress alignment workers do make. Both of these are governance problems, not alignment problems. They are also, arguably, far easier to picture possible promising interventions for than alignment research.
To lay out the logic more explicitly...
The case for governance now (Skip if you’re already convinced)
-
AI safety is urgent insofar as different capabilities groups are working towards it.
-
Different capabilities groups are propelled at least partly by “if we don’t do this, another group will anyway”, or the subtly different “we must do this, or another group will first”.
-
(2) is a coordination problem, potentially solveable with community governance.
-
AI safety is less tractable insofar as capabilities groups don’t have ways to implement alignment research into their work.
-
Solutions to (4) will, at some point, require groups/resources made available to capabilities groups.
-
(5) looks kinda like a governance problem in practice.
The AI alignment problem is quite hard, on the technical level. Governance work, as noted in (3) and (6), is both more tractable and more neglected than technical work. At least, it is right now.
The rest of this essay is less organized, but contains my thoughts for how and why this could work.
Specific stories of how this would really look in the real world, for real
-
An OpenAI team is getting ready to train a new model, but they’re worried about it’s self improvement capabilities getting out of hand. Luckily, they can consult MIRI’s 2025 Reflexivity Standards when reviewing their codebase, and get 3rd-party auditing done by The Actually Pretty Good Auditing Group (founded 2023).
-
A DeepMind employee has an idea for speeding up agent-training, but is worried about its potential to get out of hand. Worse, she’s afraid she’ll look like a fearmonger if she brings up her concerns at work. Luckily, she can bring up her concerns with The Pretty Decent Independent Tip Line, where it can then go to her boss anonymously.
-
OpenAI, DeepMind, and Facebook AI Research are all worried about their ability to control their new systems, but the relevant project managers are resigned to fatalism. Luckily, they can all communicate their progress with each other through The Actually Pretty Good Red Phone Forum, and their bosses can make a treaty through The Actually Pretty Trustworthy AI Governance Group to not train more powerful models until concrete problems X Y and Z are solved.
These aren’t necessarily the exact solutions to the above problems. Rather, they’re intuition pumps for what AI governance could look like on the ground.
Find and use existing coordination mechanisms
What happened to the Partnership For AI? Or the Asilomar conference? Can we use existing channels and build them out into coordination mechanisms that researchers can actually interact productively with?
If coordination is the bottleneck, a full effort is called for. This means hokey coordination mechanisms borrowed from open-source and academia, groups for peer-reviewing and math-checking and software auditing and standards-writing. Anything other than declaring “coordination is the bottleneck!” on a public forum and then getting nothing done.
Politics VS the other stuff
Many people in this community are turned-off by politics, perhaps explaining some of the shortage of AI governance work. But “politics”, especially in this neglected area, probably isn’t actually as hard as you think.
There’s a middle ground between “do nothing” and “become President or wage warfare”. Indeed, most effective activism is there.
This post is spot-on about basically everything it covers, and I’m really, really glad to see that someone like you thought of at least half of this on your own, discovering it independently. It’s really good news that we have thinkers like that here.
The one thing that is not spot-on is the claim that “politics probably aren’t as hard as you think”. Politics are much harder, more hostile/malevolent, less predictable, and more evil than they appear. We didn’t have to be born in a timeline where AI alignment was ever conceived of at all, in the first place, as opposed to being born on a timeline where people built AI but the concept of the Control Problem never occurred to anyone. So I think we’re very fortunate that the concept of AI alignment exists in the first place, and it would be such an unfortunate waste if the whole enchilada were to be eviscerated by the political scene.
AI governance, and governance in general, is immensely complicated and full of self-interested and outright vicious people. Many of them are also extremely smart, competent, and/or paranoid about others encroaching on their little empire that they spend their entire lives building for themselves, brick by brick, such as J. Edgar Hoover. Any really good idea of governance is probably full of these random, unforseeable “aha” moments that completely invalidate the entire good idea, because some random factor that most smart people couldn’t possibly have reasonably anticipated.
Please don’t be discouraged, this is an uncharacteristically high-quality post on AI governance and I look forward to seeing more from you in the future. I’ve learned a lot from it and many others have too.
I recommend contributing to the $20k AI alignment rhetoric and one-liner contest, it needs more entries from competent people like you who know what they’re talking about. It was forced off the front page by a bunch of naive people who know nothing about the situation with governance, so very few people are aware of the existence of that contest. If you (or anyone, really) put in 30 minutes thinking of a sorta clever quote (or just finding one) that can convince policymakers that AI alignment is a big deal, you will probably end up with $500 in your pocket; that’s how badly the contest is neglected right now.
Thanks! FWIW part of the point here is that “AI Governance” includes (but is not limited to) “real politics”, which I assume are as bad / worse as everyone here does. Hence the examples section mostly being NGOs.
And thanks for letting me know about the contest,
is there a limit on number of submissions?(EDIT: there appears to not be a limit beyond whatever LW already uses for spam filtering, ofc). I can write a lot of quotes for $500.That’s good that you’re willing to make a lot of submissions for $500, because at the way things are going, you’ll probably get $500 per submission for several submissions.
How do we deal with institutions that don’t want to be governed, say idk the Chevron corporation, North Korea, or the US military?
In my model, Chevron and the US military are probably open to AI governance, because: 1 - they are institutions traditionally enmeshed in larger cooperative/rule-of-law systems, AND 2 - their leadership is unlikely to believe they can do AI ‘better’ than the larger AI community.
My worry is instead about criminal organizations and ‘anti-social’ states (e.g. North korea) because of #1, and big tech because of #2.
Because of location, EA can (and should) make decent connective with US big tech. I think the bigger challenge will be tech companies in other countries , especially China.
My co-blogger Devin saw this comment before I did, so these points are his. Just paraphrasing:
We can still do a lot without “coordinating” every player, and governance doesn’t mean we should be ham-fisted about it.
Furthermore, even just doing coordination/governance work with some of the major US tech companies (OpenAI, Google, Microsoft, Facebook) would be really good, since they tend to be ahead of the curve (as far as we know) with the relevant technologies.
Devin also noted that there could be tension between “we’re coordinating to all extend our AI development timelines somewhat so things aren’t rushing forward” and “OpenAI originally had a goal to develop aligned AI before anyone else developed unaligned AI”. However, I think this sort of thing is minor, and doing more governance now requires some flexibility anyway.
How many of the decision makers in the companies mentioned care about or even understand the control problem? My impression was: not many.
Coordination is hard even when you share the same goals, but we don’t have that luxury here.
Current OpenAI wants to build AGI.[1] Current MIRI could confidently tell them that this is a very bad idea. Sure they could be advised that step 25 of their AGI building plan is dangerous, but so were steps 1 through 24.
MIRI’s advice to them won’t be “oh implement this safety measure and you’re golden” because there’s no such safety measure because we won’t have solved alignment by then. The advice will be “don’t do that”, as it is currently, and OpenAI will ignore it, as they do currently.
Sure, they could actually mean “build AGI in a few decades when alignment is solved and we’re gonna freeze all our current AGI building efforts long before then”, but no they don’t.
At one point (working off memory here), Sam Altman (leader of OpenAI) didn’t quite agree with the orthogonality thesis. After some discussion and emailing with someone on the Eleuther discord (iirc), he shifted to agree with it more fully. I think.
This ties into my overall point of “some of this might be adversarial, but first let’s see if it’s just straight-up neglected along some vector we haven’t looked much at yet”.