My Updating Thoughts on AI policy

My conversation on policy and AI with Richard was over a year ago, so for Daniel Kokotajlo’s LW writing day I thought I’d write down new thoughts I’ve thought since then (warning: it’s pretty rambly). I’ve structured the post around what what I did that lead to my updated thoughts.

1) I read many posts on strategy

Paul’s post What Failure Looks Like is something I’ve thought about a lot, which lays out how the abstract technical problems turn into practical catastrophes. It is a central example of what (I think) strategy work looks like. The strategic considerations are deeply tied up with the specifics of the technology.

Eliezer’s dialogue on Security Mindset and Scott Garrabrant’s post about how Optimization Amplifies have been key in my thinking about alignment and strategy, best summarised by Scott’s line “I am not just saying that adversarial optimization makes small probabilities of failure large. I am saying that in general any optimization at all messes with small probabilities and errors drastically.”

(I tried to re-state some of my own understandings of Paul’s, Eliezer’s and Scott’s posts when I talked about ML transparency in ‘Useful Doesn’t Mean Secure’. I’m glad I wrote it, it was helpful for sorting out my own thoughts, though I expect isn’t as helpful for other people.)

Eliezer wrote There’s No Fire Alarm for Artificial General Intelligence which has been key for how I think about timelines (as I said two years ago). I wrote down some of my worries about the discussion of timelines in an off-topic comment on the recent CFAR AMA, talking about how the broad x-risk network is acting like a herd stampeding away from a fear regarding AI risk, and the problems with that. I’ll clean that up and turn into a post sometime, I managed to say some things there more clearly than I’d been able to think them before.

Paul’s post Arguments about fast takeoff and linked post about hyperbolic growth, was key in my understanding of takeoff speeds, and helped me understand the gears there much better.

I’ve increasingly been thinking a lot about secrecy. I think a lot of people’s actions regarding keeping secrets have been much more damaging that they realised (the herd stampede above regarding timelines is but one example), and work needs to be done to square the necessity of secrecy with the fact that the public record has been necessary for scientific and intellectual progress to get humanity to where it is, and if we want to use that ethereal power to secure humanity against novel technologies, we’re probably going to continue to need a public record of ideas and insights. I’ve not found an opportunity to get my thoughts written down on it yet, though I said a few preliminary things in my review of Scott Garrabrant’s post on the Chatham House Rules, where commenter jbash also had some solid comments.

In MIRI’s post 2018 Update: Our New Research Directions, Nate talks about the idea of ‘deconfusion’, and says it’s something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense”. He goes on to and gives examples of how he used to be confused about a topic, and as soon as he started to ask questions or make statements he immediately said insane things. For example, when talking about infinity as a kid, he used to ask “How can 8 plus infinity still be infinity? What happens if we subtract infinity from both sides of the equation?” which turn out to make no sense. And when talking about AI he used to say things like “isn’t intelligence an incoherent concept,” “but the economy’s already superintelligent,” “if a superhuman AI is smart enough that it could kill us, it’ll also be smart enough to see that that isn’t what the good thing to do is, so we’ll be fine,” “we’re Turing-complete, so it’s impossible to have something dangerously smarter than us, because Turing-complete computations can emulate anything,” and “anyhow, we could just unplug it.”

I often like to say Nick Bostrom is the person who’s able to get deconfused on the highest strategic level. He’s able to just say one sentence at a time, and it’s so true that the rest of us rearrange our entire lives around it. “Hmm… I think if we go extinct, then that means we’re never able to come back from our mistakes, but as long as we never go extinct, then someday we’ll manage to do all the important things.” And then we all say “Wow, let us now attempt to redirect humanity around ensuring we never go extinct.” And then he waits a few years, and says another sentence. And then a few years later, he says another. Bostrom’s strategy papers have been primitive thinking blocks of my thoughts in this area. The unilateralist’s curse, the vulnerable world hypothesis, existential risk, information hazards, and so on. They creep into most of my thoughts, often without me noticing.

(The above sentence was not an actual quote, it was oversimplified. An actual one-sentence quote might be “the lesson for utilitarians is not that we ought to maximize the pace of technological development, but rather that we ought to maximize its safety, i.e. the probability that colonization will eventually occur.”)

On policy specifically, there were two more FHI papers I read. I thought Bostrom’s policy paper was really interesting, applying many of the ideas from Superintelligence to policy. This felt like real conceptual progress, applying an understanding of advanced technology to policy, and it was the paper that actually got me to feel excited about policy research. I summarised it in a post.

And I read Dafoe’s research agenda. I wrote comments quoting a the bits I liked in the comments of this post. Overall the research agenda just asked a lot of questions though, I didn’t get much from reading it.

2) I helped evaluate a grant in AI policy

I spent around 15-25 hours evaluating and writing down my thoughts about the Jess Whittlestone/CSER grant with Oliver for the LTFF. My thoughts went into Oli’s public writeup analysing her writing, plus a long addendum on a Nature article by the director of CSER Seán ÓhÉigeartaigh (and coauthored with the director of Leverhulme CFI Stephen Cave). I was fairly proud of the two write-ups, I said things there more clearly than I’d previously been able to think them.

My main updates from all that:

Current policy/strategy work is at the level where we don’t know what basic concepts to deal with, and most of the current work we need to do is in figuring out how to think about the problem, not in proposing concrete solutions.
The policy goal of banning lethal autonomous weapons is an example of going too concrete too early and proposing something that doesn’t really make sense.
- Just because a human is in the loop doesn’t mean they understand the system or how it works, and so doesn’t ensure that their decision is meaningful. I have not seen advocacy on this subject deal with the details of how military systems work or cohere around a specific policy proposal for how military ML architectures will work and what specific designs to use that will ensure a human in the loop is able to keep the same kind of control as over current weapons. It’s mostly an accurate description of one of the many possible uses of advanced ML combined with a social push to “not do that”, as though it were a binary action as opposed to an incredibly massive space of possible policies where most of them are ineffectual.
- ‘Slaughterbots’ is a short horror-fiction video which has 1 million views on YouTube. It is a piece of high-effort, well-executed fearmongering about AI’s military use. I had not actually watched it until just now in writing this post. I am shocked to see Stuart Russell appear at the end of the video and endorse this as crucially important, this will substantially obfuscate any public dialogue and messaging that he and others wants to do about alignment and AGI. I’m pretty disappointed. He’s committing the own-goal that matches the mistake Scott joked about journalists making – like if he was trying to be the public’s nuclear x-risk expert and said “A sufficiently large nuclear war could completely destroy human civilization. If the bombs struck major manufacturing centers, they could also cause thousands of people to be put out of work.”
Many attempts to gain political power often come by taking the issue you care about and connecting it to a major political debate which already has massive amounts of political capital pulling in both directions. This ends up leaving your issue co-opted by entities with massively more political capital (this results in the kind of thing where Clinton gave a billion dollars to ‘nanotechnology’ research that just went to funding existing major scientific bodies and not anything that Drexler had hoped or would’ve considered nanotech – I think). It’s a hard tradeoff to make, and if it’s important to gain power in government, this is one of the primary and most common routes.
- I expect that when a number of ML experts see people like Russell talking about slaughterbots in a hyperbolic and imprecise fashion, this reads to them as Russell allowing himself to be coopted by larger political forces in order to gain power, and makes them spend less effort engaging with his technical arguments about the danger of misalignment, which read to them as rationalisations rather than the cause of the speeches he makes in public.
- The article above by ÓhÉigeartaigh and Cave makes similar connections (see the linked writeup for more detail), saying that the ‘long-term issues’ that Bostrom and others care about basically includes unemployment, which is not something I care much about, but is a widely discussed political issue that can quickly get you a fair bit of attention and power if you have some prestige and say things that fit into the well-known standard positions on that debate.

3) I had some discussions with ML researchers.

I once had a conversation with an ML researcher I respected, and found I was tripping over myself to not outright call their work net-negative (they were very excited about their work). I thought it would feel really uncomfortable to believe that of your own work, so I expected if I were to say I thought their work was net-negative then they would feel attacked and not have a line of retreat other than to decide I must be wrong.

Recently, Oli pointed out to me that Machine Learning as a field is in a bad position to decide that Machine Learning research is net negative, and from a basic incentive model you should predict that they will never believe this.

This suggests that the work of pointing out problems should be done by other groups, perhaps economists or government regulators, who are not in an impossible incentive setup, and that only the work of solving those problems should primarily be done by the AI researchers. I think a lot of the basic ideas about what paths would lead AI to be catastrophic are well-phrased in terms of economics, and I’m more excited about the idea of an economics department analysing AI and risks from that field. There are many great independent centres in academia like CHAI, FHI, and GMU, and it’d be interesting to figure out how to build a small econ department like CHAI, built around analysing risk from AI.

4) I listened to a podcast and read some blogposts by science-minded people senior in government.

I wrote about listening to Tom Kalil’s 80k podcast in my post A Key Power of the President is to Coordinate the Execution of Existing Concrete Plans. This substantially increased my model of the tractability of getting things done within governments on timescales of 4-8 years.

My general thinking is that we’re mostly confused about AI—both how to conceptualise it, and what is likely to happen—and these feel like fundamental questions that need answering before I can say what to do about it. I think almost nobody is making real progress on that front. Kalil’s post fit in with this, where he talked about how he can do a lot of awesome sh*t with the president, but when people haven’t got a very concrete policy proposal, he can’t do anything. You can’t tell him to ‘care more’ about AI safety. You need to also know what to do about it. Like the above, ‘stopping autonomous drones’ isn’t a primitive action, and figuring out what the right action is will be most of the work.

While Kalil made me update negatively on the utility of interacting with government in the near to medium term, Cummings obviously suggests that rationalists should think really freaking hard about what could be useful to do in the next 5 years.

(Note: I wrote this post a few months ago, before the most recent UK election, and am just polishing it up to publish today for blogpost writing day. I hadn’t thought much about Cummings at the time, and I’ve left it as-is for now, so this doesn’t say much really.)

While I think that novel research is quite rarely done inside government, and that humanity’s scientific and intellectual exploration is likely key for us dealing well with AI and other technologies (which is why I primarily work on LessWrong, where I think we have a shot at really kickstarting intellectual thought on such topics), I still tried to think more about useful things that can be done in government today.

I spent an hour or two, here are my first thoughts.

Someone in strong positions in government could help coordinate the creation of AI research centres in industry and academia in AI and economics (as above), insofar as we have good people to lead them (which we don’t really atm), and help give them prestige.
Someone with power in government could help with setting standards for release practises in Machine Learning, which do sound like the sort of thing that will be pretty important. I’ve been quite impressed by OpenAI doing work on this.
If those two work out, I can imagine it being sensible to begin international conversations with other countries about them implementing similar things.
- I really have not given this much thought and I would not take active action in this space without giving it dozens/hundreds of hours of thought and having some long discussions in with a number of other people I respect who have thought about it a lot and/or have different perspectives from me.
When thinking of exciting things to do with governance not directly related to AI, I’m broadly excited about more economically thoughtful governments trying out various ideas that people like Robin Hanson and Paul Christiano have come up with such as prediction markets, impact certificates, governance by jury by markets, and more.
- (We’ve been thinking about selling the altruistic equity of the LessWrong Team—instead of doing normal non-profit fundraising—which I currently understand to be the one-player version of impact certificate markets.)
- Those economic ideas have been a big shock to me, I didn’t realise how many new basic ideas one can have about using markets to solve problems. Does econ academia also reliably produce these kinds of basic institutional ideas often? Seems like there’s a lot of space here to explore.