MichaelA comments on Clarifying some key hypotheses in AI alignment

MichaelA 2 Jan 2021 7:38 UTC
3 points
It occurs to me that all of the hypotheses, arguments, and approaches mentioned here (though not necessarily the scenarios) seem to be about the “technical” side of things. There are two main things I mean by that statement:
First, this post seems to be limited to explaining something along the lines of “x-risks from AI accidents”, rather than “x-risks from misuse of AI”, or “x-risk from AI as a risk factor” (e.g., how AI could potentially increase risks of nuclear war).
I do think it makes sense to limit the scope that way, because:
- no one post can cover everything
- you don’t want to make the diagram overwhelming
- there’s a relatively clear boundary between what you’re covering and what you’re not
- what you’re covering seems like the most relevant thing for technical AI safety researchers, whereas the other parts are perhaps more relevant for people working on AI strategy/governance/policy
And the fact that this post’s scope is limited in that way seems somewhat highlighted by saying this is about AI alignment (whereas misuse could occur even with a system aligned to some human’s goals), and by saying “The idea is closely connected to the problem of artificial systems optimizing adversarially against humans.”
But I think misuse and “risk factor”/“structural risk” issues are also quite important, that they should be on technical AI safety researchers’ radars to some extent, and that they probably interact in some ways with technical AI safety/alignment. So, personally, I think I’d have made that choice of scope even more explicit.
I’d also be really excited to see a post that takes the same approach as this one, but for those other classes of AI risks.
---
The second thing I mean by the above statement is that this post seems to exclude non-technical factors that seem like they’d also impact the technical side or the AI accident risks.
One crux of this type would be “AI researchers will be cautious/sensible/competent “by default””. Here are some indications that that’s an “important and controversial hypothes[is] for AI alignment”:
- AI Impacts summarised some of Rohin’s comments as “AI researchers will in fact correct safety issues rather than hacking around them and redeploying. Shah thinks that institutions developing AI are likely to be careful because human extinction would be just as bad for them as for everyone else.”
- But my impression is that many people at MIRI would disagree with that, and are worried that people will merely “patch” issues in ways that don’t adequately address the risks.
- And I think many would argue that institutions won’t be careful enough, because they only pay a portion of the price of extinction; reducing extinction risk is a transgenerational global public good (see Todd and this comment).
- And I think views on these matters influence how much researchers would be happy with the approach of “Use feedback loops to course correct as we go”. I think the technical things influence how easily we theoretically could do that, while the non-technical things influence how much we realistically can rely on people to do that.
So it seems to me that a crux like that could perhaps fit well in the scope of this post. And I thus think it’d be cool if someone could either (1) expand this post to include cruxes like that, or (2) make another post with a similar approach, but covering non-technical cruxes relevant to AI safety.
- Ben Cottier 24 Jan 2021 15:38 UTC
  2 points
  Parent
  To your first point—I agree both with why we limited the scope (but also, it was partly just personal interests), and that there should be more of this kind of work on other classes of risk. However, my impression is the literature and “public” engagement (e.g. EA forum, LessWrong) on catastrophic AI misuse/structural risk is too small to even get traction on work like this. We might first need more work to lay out the best arguments. Having said that, I’m aware of a fair amount of writing which I haven’t got around to reading. So I am probably misjudging the state of the field.
  
  To your second point—that seems like a real crux and I agree it would be good to expand in that direction. I know some people working on expanded and more in-depth models like this post. It would be great to get your thoughts when they’re ready.
  - MichaelA 25 Jan 2021 0:27 UTC
    2 points
    Parent
    To your first point...
    My impression is that there is indeed substantially less literature on misuse risk and structural risk, compared to accident risk, in relation to AI x-risk. (I’m less confident when it comes to a broader set of negative outcomes, not just x-risks, but that’s also less relevant here and less important to me.) I do think that that might the sort of work this post does less interesting if done in relation to those less-discussed types of risks, since there fewer disagreements have been revealed, so there’s less to analyse and summarise.
    That said, I still expect interesting stuff along these lines could be done on those topics. It just might be a quicker job with a smaller output than this post.
    I collected a handful of relevant sources and ideas here. I think someone reading those things and providing a sort of summary, analysis, and/or mapping could be pretty handy, and might even be doable in just a day or so of work. It might also be relatively easy to provide more “novel ideas” in the course of that work that it would’ve been for your post, since misuse/structural risks seem like less charted territory.
    (Unfortunately I’m unlikely to do this myself, as I’m currently focused on nuclear war risk.)
    ---
    A separate point is that I’d guess that one reason why there’s less work on misuse/structural AI x-risk than on accidental AI x-risk is that a lot of people aren’t aware of those other categories of risks, or rarely think about them, or assume the risks are much smaller. And I think one reason for that is that people often write or talk about “AI x-risk” while actually only mentioning accidental AI x-risk. That’s part of why I say “So, personally, I think I’d have made that choice of scope even more explicit.”
    (But again, I do very much like this post overall. And as a target of this quibble of mine, you’re in good company—I have the same quibble with The Precipice. I think one of the quibbles I most often have with posts I like is “This post seems to imply, or could be interpreted as implying, that it covers [topic]. But really it covers [some subset of that topic]. That’s fair enough and still very useful, but I think it’d be good to be clearer about what the scope is.”)
    ---
    I know some people working on expanded and more in-depth models like this post. It would be great to get your thoughts when they’re ready.
    Sounds very cool! Yeah, I’d be happy to have a look at that work when it’s ready.