Anytime alignment plan: Detailed exploration of a hypothetical in which a system trained in the next year turns out to be AGI, with particular focus on what alignment techniques would be applied.
I’d personally love to see similar plans from AI safety orgs, especially (big) funders.
Here is the doc, though note that it is very out of date. I don’t particularly want to recommend people read this doc, but it is possible that someone will find it valuable to read.
I don’t think funders are in a good position to do this. Also, funders are generally not “coherant”. Like they don’t have much top down strategy. Individual granters could write up thoughts.
I’d personally love to see similar plans from AI safety orgs, especially (big) funders.
We’re working on something along these lines. The most up-to-date published post is just our control post and our Notes on control evaluations for safety cases which is obviously incomplete.
I’m planing on posting a link to our best draft of a ready-to-go-ish plan as of 1 year ago, though it is quite out of date and incomplete.
I posted the link here.
Here is the doc, though note that it is very out of date. I don’t particularly want to recommend people read this doc, but it is possible that someone will find it valuable to read.
I don’t think funders are in a good position to do this. Also, funders are generally not “coherant”. Like they don’t have much top down strategy. Individual granters could write up thoughts.