orthonormal comments on MIRI 2024 Mission and Strategy Update

orthonormal 5 Jan 2024 20:25 UTC
39 points
14
I think MIRI is correct to call it as they see it, both on general principles and because if they turn out to be wrong about genuine alignment progress being very hard, people (at large, but also including us) should update against MIRI’s viewpoints on other topics, and in favor of the viewpoints of whichever AI safety orgs called it more correctly.
- Rob Bensinger 6 Jan 2024 1:05 UTC
  12 points
  1
  Parent
  Yep, before I saw orthonormal’s response I had a draft-reply written that says almost literally the same thing:
  we just call ‘em like we see ’em
  [...]
  insofar as we make bad predictions, we should get penalized for it. and insofar as we think alignment difficulty is the crux for ‘why we need to shut it all down’, we’d rather directly argue against illusory alignment progress (and directly acknowledge real major alignment progress as a real reason to be less confident of shutdown as a strategy) rather than redirect to something less cruxy
  I’ll also add: Nate (unlike Eliezer, AFAIK?) hasn’t flatly said ‘alignment is extremely difficult’. Quoting from Nate’s “sharp left turn” post:
  Many people wrongly believe that I’m pessimistic because I think the alignment problem is extraordinarily difficult on a purely technical level. That’s flatly false, and is pretty high up there on my list of least favorite misconceptions of my views.
  I think the problem is a normal problem of mastering some scientific field, as humanity has done many times before. Maybe it’s somewhat trickier, on account of (e.g.) intelligence being more complicated than, say, physics; maybe it’s somewhat easier on account of how we have more introspective access to a working mind than we have to the low-level physical fields; but on the whole, I doubt it’s all that qualitatively different than the sorts of summits humanity has surmounted before.
  It’s made trickier by the fact that we probably have to attain mastery of general intelligence before we spend a bunch of time working with general intelligences (on account of how we seem likely to kill ourselves by accident within a few years, once we have AGIs on hand, if no pivotal act occurs), but that alone is not enough to undermine my hope.
  What undermines my hope is that nobody seems to be working on the hard bits, and I don’t currently expect most people to become convinced that they need to solve those hard bits until it’s too late.
  So it may be that Nate’s models would be less surprised by alignment breakthroughs than Eliezer’s models. And some other MIRI folks are much more optimistic than Nate, FWIW.
  My own view is that I don’t feel nervous leaning on “we won’t crack open alignment in time” as a premise, and absent that premise I’d indeed be much less gung-ho about government intervention.
  why put all your argumentative eggs in the “alignment is hard” basket? (If you’re right, then policymakers can’t tell that you’re right.)
  The short answer is “we don’t put all our eggs in the basket” (e.g., Eliezer’s TED talk and TIME article emphasize that alignment is an open problem, but they emphasize other things too, and they don’t go into detail on exactly how hard Eliezer thinks the problem is), plus “we very much want at least some eggs in that basket because it’s true, it’s honest, it’s cruxy for us, etc.” And it’s easier for policymakers to acquire strong Bayesian evidence for “the problem is currently unsolved” and “there’s no consensus about how to solve it” and “most leaders in the field seem to think there’s a serious chance we won’t solve it in time” than to acquire strong Bayesian evidence for “we’re very likely generations away from solving alignment”, so the difficulty of communicating the latter isn’t a strong reason to de-emphasize all the former points.
  The longer answer is a lot more complicated. We’re still figuring out how best to communicate our views to different audiences, and “it’s hard for policymakers to evaluate all the local arguments or know whether Yann LeCun is making more sense than Yoshua Bengio” is a serious constraint. If there’s a specific argument (or e.g. a specific three arguments) you think we should be emphasizing alongside “alignment is unsolved and looks hard”, I’d be interested to hear your suggestion and your reasoning. https://www.lesswrong.com/posts/WXvt8bxYnwBYpy9oT/the-main-sources-of-ai-risk is a very long list and isn’t optimized for policymakers, so I’m not sure what specific changes you have in mind here.
  - Wei Dai 6 Jan 2024 7:02 UTC
    4 points
    0
    Parent
    
    If there’s a specific argument (or e.g. a specific three arguments) you think we should be emphasizing alongside “alignment is unsolved and looks hard”, I’d be interested to hear your suggestion and your reasoning.
    
    The items on my list are of roughly equal salience to me. I don’t have specific suggestions for people who might be interested in spreading awareness of these risks/arguments, aside from picking a few that resonate with you and are also likely to be well received by the intended audience. And maybe link back to the list (or some future version of such a list) so that people don’t think the ones you choose to talk about are the only risks.
    
    For me personally, I tend to talk about “philosophy is hard” (which feeds into “alignment is hard” and beyond) and “humans aren’t safe” (humans suffer from all kinds of safety problems just like AIs do, including being easily persuaded of strange beliefs and bad philosophy, calling “alignment” into question even as a goal). These might not work well on a broader audience though, the kind that MIRI is presumably trying to reach. Some adjacent messages might, for example, “even if alignment succeeds, humans can’t be trusted with God-like powers yet; we need to become much wiser first” and “AI persuasion will be a big problem” (but honestly I have little idea due to lack of experience talking outside my circle).
  - Rob Bensinger 6 Jan 2024 1:39 UTC
    3 points
    1
    Parent
    To pick out a couple of specific examples from your list, Wei Dai:
    14. Human-controlled AIs causing ethical disasters (e.g., large scale suffering that can’t be “balanced out” later) prior to reaching moral/philosophical maturity
    This is a serious long-term concern if we don’t kill ourselves first, but it’s not something I see as a premise for “the priority is for governments around the world to form an international agreement to halt AI progress”. If AI were easy to use for concrete tasks like “build nanotechnology” but hard to use for things like CEV, I’d instead see the priority as “use AI to prevent anyone else from destroying the world with AI”, and I wouldn’t want to trade off probability of that plan working in exchange for (e.g.) more probability of the US and the EU agreeing in advance to centralize and monitor large computing clusters.
    After someone has done a pivotal act like that, you might then want to move more slowly insofar as you’re worried about subtle moral errors creeping in to precursors to CEV.
    30. AI systems end up controlled by a group of humans representing a small range of human values (ie. an ideological or religious group that imposes values on everyone else)
    I currently assign very low probability to humans being able to control the first ASI systems, and redirecting governments’ attention away from “rogue AI” and toward “rogue humans using AI” seems very risky to me, insofar as it causes governments to misunderstand the situation, and to specifically misunderstand it in a way that encourages racing.
    If you think rogue actors can use ASI to achieve their ends, then you should probably also think that you could use ASI to achieve your own ends; misuse risk tends to go hand-in-hand with “we’re the Good Guys, let’s try to outrace the Bad Guys so AI ends up in the right hands”. This could maybe be justified if it were true, but when it’s not even true it strikes me as an especially bad argument to make.
- Wei Dai 7 Jan 2024 2:19 UTC
  2 points
  0
  Parent
  I wrote this top level comment in part as a reply to you.