Daniel Kokotajlo comments on Zach Stein-Perlman’s Shortform

Daniel Kokotajlo 7 Sep 2024 19:40 UTC
11 points
9
One thing I’d really like labs to do is encourage their researchers to blog about their thoughts on the future, on alignment plans, etc.

Another related but distinct thing is have safety cases and have an anytime alignment plan and publish redacted versions of them.
Safety cases: Argument for why the current AI system isn’t going to cause a catastrophe. (Right now, this is very easy to do: ‘it’s too dumb’)

Anytime alignment plan: Detailed exploration of a hypothetical in which a system trained in the next year turns out to be AGI, with particular focus on what alignment techniques would be applied.
- ryan_greenblatt 7 Sep 2024 20:18 UTC
  14 points
  11
  Parent
  
  One thing I’d really like labs to do is encourage their researchers to blog about their thoughts on the future, on alignment plans, etc.
  
  Or, as a more minimal ask, they could avoid discouraging researchers from sharing thoughts implicitly due to various chilling effects and also avoid explicitly discouraging researchers.
- Bogdan Ionut Cirstea 7 Sep 2024 20:02 UTC
  4 points
  0
  Parent
  Anytime alignment plan: Detailed exploration of a hypothetical in which a system trained in the next year turns out to be AGI, with particular focus on what alignment techniques would be applied.
  I’d personally love to see similar plans from AI safety orgs, especially (big) funders.
  - ryan_greenblatt 7 Sep 2024 20:30 UTC
    3 points
    0
    Parent
    We’re working on something along these lines. The most up-to-date published post is just our control post and our Notes on control evaluations for safety cases which is obviously incomplete.
    
    I’m planing on posting a link to our best draft of a ready-to-go-ish plan as of 1 year ago, though it is quite out of date and incomplete.
    - ryan_greenblatt 7 Sep 2024 22:29 UTC
      5 points
      2
      Parent
      I posted the link here.
      
      Here is the doc, though note that it is very out of date. I don’t particularly want to recommend people read this doc, but it is possible that someone will find it valuable to read.
  - ryan_greenblatt 7 Sep 2024 20:19 UTC
    3 points
    5
    Parent
    I don’t think funders are in a good position to do this. Also, funders are generally not “coherant”. Like they don’t have much top down strategy. Individual granters could write up thoughts.