Zac Hatfield-Dodds comments on In response to critiques of Guaranteed Safe AI

Zac Hatfield-Dodds Feb 1, 2025, 11:16 AM
LW: 14 AF: 10
14
AF
I’m sorry that I don’t have time to write up a detailed response to (critique of?) the response to critiques; hopefully this brief note is still useful.
1. I remain frustrated by GSAI advocacy. It’s suited for well-understood closed domains, excluding e.g. natural language, when discussing feasibility; but ‘we need rigorous guarantees for current or near-future AI’ when arguing for importance. It’s an extension to or complement of current practice; and current practice is irresponsible and inadequate. Often this is coming from different advocates, but that doesn’t make it less frustrating for me.
2. Claiming that non-vacuous sound (over)approximations are feasible, or that we’ll be able to specify and verify non-trivial safety properties, is risible. Planning for runtime monitoring and anomaly detection is IMO an excellent idea, but would be entirely pointless if you believed that we had a guarantee!
3. It’s vaporware. I would love to see a demonstration project and perhaps lose my bet, but I don’t find papers or posts full of details compelling, however long we could argue over them. Nullius in verba!
I like the idea of using formal tools to complement and extend current practice—I was at the workshop where Towards GSAI was drafted, and offered co-authorship—but as much I admire the people involved, I just don’t believe the core claims of the GSAI agenda as it stands.
What links here?
- Rohin Shah's comment on In response to critiques of Guaranteed Safe AI by Nora_Ammann (Feb 2, 2025, 12:39 PM; 12 points)
- Noosphere89 Feb 1, 2025, 4:38 PM
  2 points
  0
  Parent
  This seems like a crux here, one that might be useful to uncover further:
  2. Claiming that non-vacuous sound (over)approximations are feasible, or that we’ll be able to specify and verify non-trivial safety properties, is risible. Planning for runtime monitoring and anomaly detection is IMO an excellent idea, but would be entirely pointless if you believed that we had a guarantee!
  I broadly agree with you that most of the stuff proposed is either in it’s infancy or is essentially vaporware that doesn’t really work without AIs being so good that the plan would be wholly irrelevant, and thus is really unuseful for short timelines work, but I do believe enough of the plan is salvageable to make it not completely useless, and in particular, is the part where it’s very possible for AIs to help in real ways (at least given some evidence):
  
  https://www.lesswrong.com/posts/DZuBHHKao6jsDDreH/in-response-to-critiques-of-guaranteed-safe-ai#Securing_cyberspace
  - Zac Hatfield-Dodds Feb 2, 2025, 1:55 AM
    4 points
    −4
    Parent
    Improving the sorry state of software security would be great, and with AI we might even see enough change to the economics of software development and maintenance that it happens, but it’s not really an AI safety agenda.
    
    (added for clarity: of course it can be part of a safety agenda, but see point #1 above)
    - Noosphere89 Feb 2, 2025, 2:20 AM
      4 points
      4
      Parent
      I agree that it isn’t a direct AI safety agenda, though I will say that software security would be helpful for control agendas, and the increasing capabilities of AI mathematics could, in principle, help with AI alignment agendas that are mostly mathematical like Vanessa Kosoy’s agenda:
      
      It’s also useful for AI control purposes.
      
      More below:
      
      https://www.lesswrong.com/posts/oJQnRDbgSS8i6DwNu/the-hopium-wars-the-agi-entente-delusion#BSv46tpbkcXCtpXrk
    - Nathan Helm-Burger Feb 2, 2025, 3:10 AM
      3 points
      2
      Parent
      Depends on your assumptions. If you assume that a pretty-well-intent-aligned pretty-well-value-aligned AI (e.g. Claude) scales to a sufficiently powerful tool to gain sufficient leverage on the near-term future to allow you to pause/slow global progress towards ASI (which would kill us all)...
      
      Then having that powerful tool, but having a copy of it stolen from you and used for cross-purposes that prevent you plan from succeeding… Would be snatching defeat from the jaws of victory.
      
      Currently we are perhaps close to creating such a powerful AI tool, maybe even before ‘full AGI’ (by some definition). However, we are nowhere near the top AI labs having good enough security to prevent their code and models from being stolen by a determined state-level adversary.
      
      So in my worldview, computer security is inescapably connected to AI safety.
      - Noosphere89 Feb 2, 2025, 3:31 AM
        4 points
        0
        Parent
        
        Depends on your assumptions. If you assume that a pretty-well-intent-aligned pretty-well-value-aligned AI (e.g. Claude) scales to a sufficiently powerful tool to gain sufficient leverage on the near-term future to allow you to pause/slow global progress towards ASI (which would kill us all)...
        
        We can drop the assumption that ASI inevitably kills us all/we should pause and still have the above argument work, or as I like to say it, practical AI alignment/safety is very much helped by computer security, especially against state adversaries.
        
        I think Zach-Stein Perlman is overstating the case, but here it is:
        
        https://www.lesswrong.com/posts/eq2aJt8ZqMaGhBu3r/zach-stein-perlman-s-shortform#ckNQKZf8RxeuZRrGH
- Ben Goldhaber Feb 8, 2025, 2:55 AM
  1 point
  0
  Parent
  Minor point: It seems unfair to accuse GSAI of being vaporware. It has been less than a year since the GSAI paper came out and 1.5 since Tegmark/Omohundro’s Provably Safe paper, and there are many projects being actively funded through ARIA and others that should serve as tests. No GSAI researchers that I know of promised significant projects in 2024 - in fact several explicitly think the goal should be to do deconfusion and conceptual work now and plan to leverage the advances in autoformalization and AI-assisted coding that are coming down the pipe fast.
  While I agree that there are not yet compelling demonstrations, this hardly seems at the level of Duke Nukem Forever!