Akash comments on Would catching your AIs trying to escape convince AI developers to slow down or undeploy?

Akash Aug 27, 2024, 2:22 AM
84 points
31
The main reason I’d be more freaked out is that I already think egregious misalignment is fairly plausible; if I thought it was very unlikely, I wouldn’t change my mind based on one weird observation.
I agree with this perspective, and I think it has some important implications for AI policy/comms.
My understanding is that there are very few individuals who are actually trying to help policymakers understand misalignment concerns or loss-of-control concerns. There has been a widespread meme that in order to appear respectable and legitimate to policymakers, you need to emphasize CBRN concerns, misuse concerns, and China concerns and you don’t want to come across as “weird” by mentioning misalignment concerns.
(I think these concerns have been overstated– my own experiences suggest that many policymakers appear to be very interested in better understanding loss of control scenarios. And in fact some policymakers see this as the main “value add” from AI people– they tend to rely on existing CBRN experts and existing China experts for the misuse topics).
Regardless, even if we were in a world where the best way to gain status was to avoid talking about misalignment risks, this would come at a major cost: policymakers would not understand misalignment risks, and their actions would reflect this lack of understanding/context/previous conceptual understanding.
I think one of the most important things for AI policy folks to do is to find concise/compelling ways to convey concerns about misalignment.
To be clear, I think this is extremely challenging. One of the reasons why so few people are explaining misalignment concerns to policymakers is that it is very difficult to actually answer questions like “I understand that this is possible, but why is this likely?” or “Why won’t the companies just develop safeguards to prevent this from happening?”, or “As the science gets better, won’t we just figure out how to avoid this?”
But I think a year ago, I expected that a significant fraction of the AI policy community was regularly giving presentations about misalignment. I think I substantially overestimated the amount of effort that was going into this, and I think this is one of the most neglected/important topics in AI safety.
(And while I don’t want to call anyone out in particular, I do feel like some folks in the AI safety community actively discouraged people from trying to have frank/honest/clear conversations with policymakers about misalignment risks. And I think some AI policy groups continue to avoid talking about misalignment risks and systematically empower people who are going to appear more “normal” and systematically avoid hiring people who are inclined to try to explain misalignment/loss-of-control scenarios. It does seem like the culture has been changing a bit in the last several months, but I think this “don’t talk about misalignment concerns with policymakers” gatekeeping is among the worst mistakes made by the AI policy community.)
What kinds of resources would be helpful? Some examples IMO:
- Short presentations that summarize key reasons to be concerned about misalignment risks
- People who can have 1-1s with policymakers and actually feel like they can answer basic questions about misalignment risks
- Pieces like this and this.
- Attempts at taking things like List of Lethalities and Without Specific Countermeasures and explain them in short/concise/clear ways.
- Statements from labs, lab employees, and other respected AI experts that they are concerned about misalignment risks (IMO Bengio is a good example of this; he seems to do this more frequently than ~everyone else and still maintain high quality and high clarity.)
(Also, to be clear, I’m not saying that policy groups should never “meet people where they’re at” or never discuss things like CBRN risks/China/misuse risks. I do think, however, that most LW readers would likely be shocked/disappointed at just how little % of time is spent explaining misalignment risks relative to everything else.)
What links here?
- Akash's comment on AIs Will Increasingly Attempt Shenanigans by Zvi (Dec 19, 2024, 5:07 PM; 10 points)
- Sebastian Schmidt Aug 31, 2024, 10:39 AM
  2 points
  0
  Parent
  Could you share more about this:
  
  “my own experiences suggest that many policymakers appear to be very interested in better understanding loss of control scenarios.”
  
  For example, what proportion of people you’ve interacted with gave you this impression and the context in which it happened (I understand that you’d have to keep details sparse for obvious reasons).
- M. Y. Zuo Aug 30, 2024, 2:15 AM
  2 points
  −7
  Parent
  Why not just create this ‘short presentation’ yourself?
  
  It probably wouldn’t even have half the word count of this comment you’ve already written, and should be much more persuasive than the whole thing.
  
  I don’t want to pick on you specifically, but it’s hard to ignore the most direct and straightforward solution to the problems identified.
  - Akash Aug 30, 2024, 2:01 PM
    8 points
    4
    Parent
    Despite the disagree votes, I actually think this is a reasonable suggestion– and indeed more people should be asking “why doesn’t X policy person go around trying to explain superintelligence risks?”
    I think the disagree votes are probably coming from the vibe that this would be easy/simple to do. Short (in length) is not the same as easy.
    My own answer is some mix of fear that I’m not the right person, some doubts about which points to emphasize, and some degree of “I may start working more deliberately on this soon in conjunction with someone else who will hopefully be able to address some of my blind spots.”
    - M. Y. Zuo Aug 30, 2024, 7:01 PM
      2 points
      −4
      Parent
      I don’t think my comment gave off the vibe that it is ‘easy/simple’ to do, just that it isn’t as much of a long shot as the alternative.
      
      i.e. Waiting for someone smarter, more competent, politically savvier, etc…, to read your comment and then hoping for them to do it.
      
      Which seems to have a very low probability of happening.
      - Daphne_W Sep 13, 2024, 12:01 AM
        1 point
        0
        Parent
        “why not just” is a standard phrase for saying what you’re proposing would be simple or come naturally if you try. Combined with the rest of the comment talking about straightforwardness and how little word count, and it does give off a somewhat combatitive vibe.
        I agree with your suggestion and it is good to hear that you don’t intend it imply that it is simple, so maybe it would be worth editing the original comment to prevent miscommunication for people who haven’t read it yet. For the time being I’ve strong-agreed with your comment to save it from a negativity snowball effect.
  - Linch Aug 30, 2024, 2:40 AM
    7 points
    1
    Parent
    Good comms for people who don’t share your background assumptions is often really hard!
    That said I’d definitely encourage Akash and other people who understand both the AI safety arguments and policymakers to try to convey this well.
    
    Maybe I’ll take a swing at this myself at some point soon; I suspect I don’t really know what policymakers’ cruxes were or how to speak their language but at least I’ve lived in DC before.
    - M. Y. Zuo Aug 30, 2024, 9:04 AM
      1 point
      0
      Parent
      Then this seems to be an entirely different problem?
      
      At the very least, resolving substantial differences in background assumptions is going to take a lot more than a ‘short presentation’.
      
      And it’s very likely those in actual decision making positions will be much less charitable than me, since their secretaries receive hundreds or thousands of such petitions every week.
      - Linch Oct 3, 2024, 1:54 PM
        2 points
        0
        Parent
        I’m not suggesting to the short argument should resolve those background assumptions, I’m suggesting that a good argument for people who don’t share those assumptions roughly entails being able to understand someone else’s assumptions well enough to speak their language and craft a persuasive and true argument on their terms.