Bogdan Ionut Cirstea comments on Zach Stein-Perlman’s Shortform

Bogdan Ionut Cirstea 7 Sep 2024 20:02 UTC
4 points
0
Anytime alignment plan: Detailed exploration of a hypothetical in which a system trained in the next year turns out to be AGI, with particular focus on what alignment techniques would be applied.
I’d personally love to see similar plans from AI safety orgs, especially (big) funders.
- ryan_greenblatt 7 Sep 2024 20:30 UTC
  3 points
  0
  Parent
  We’re working on something along these lines. The most up-to-date published post is just our control post and our Notes on control evaluations for safety cases which is obviously incomplete.
  
  I’m planing on posting a link to our best draft of a ready-to-go-ish plan as of 1 year ago, though it is quite out of date and incomplete.
  - ryan_greenblatt 7 Sep 2024 22:29 UTC
    5 points
    2
    Parent
    I posted the link here.
    
    Here is the doc, though note that it is very out of date. I don’t particularly want to recommend people read this doc, but it is possible that someone will find it valuable to read.
- ryan_greenblatt 7 Sep 2024 20:19 UTC
  3 points
  5
  Parent
  I don’t think funders are in a good position to do this. Also, funders are generally not “coherant”. Like they don’t have much top down strategy. Individual granters could write up thoughts.