One thing I’d really like labs to do is encourage their researchers to blog about their thoughts on the future, on alignment plans, etc.
Another related but distinct thing is have safety cases and have an anytime alignment plan and publish redacted versions of them.
Safety cases: Argument for why the current AI system isn’t going to cause a catastrophe. (Right now, this is very easy to do: ‘it’s too dumb’)
Anytime alignment plan: Detailed exploration of a hypothetical in which a system trained in the next year turns out to be AGI, with particular focus on what alignment techniques would be applied.
One thing I’d really like labs to do is encourage their researchers to blog about their thoughts on the future, on alignment plans, etc.
Or, as a more minimal ask, they could avoid discouraging researchers from sharing thoughts implicitly due to various chilling effects and also avoid explicitly discouraging researchers.
Anytime alignment plan: Detailed exploration of a hypothetical in which a system trained in the next year turns out to be AGI, with particular focus on what alignment techniques would be applied.
I’d personally love to see similar plans from AI safety orgs, especially (big) funders.
Here is the doc, though note that it is very out of date. I don’t particularly want to recommend people read this doc, but it is possible that someone will find it valuable to read.
I don’t think funders are in a good position to do this. Also, funders are generally not “coherant”. Like they don’t have much top down strategy. Individual granters could write up thoughts.
One thing I’d really like labs to do is encourage their researchers to blog about their thoughts on the future, on alignment plans, etc.
Another related but distinct thing is have safety cases and have an anytime alignment plan and publish redacted versions of them.
Safety cases: Argument for why the current AI system isn’t going to cause a catastrophe. (Right now, this is very easy to do: ‘it’s too dumb’)
Anytime alignment plan: Detailed exploration of a hypothetical in which a system trained in the next year turns out to be AGI, with particular focus on what alignment techniques would be applied.
Or, as a more minimal ask, they could avoid discouraging researchers from sharing thoughts implicitly due to various chilling effects and also avoid explicitly discouraging researchers.
I’d personally love to see similar plans from AI safety orgs, especially (big) funders.
We’re working on something along these lines. The most up-to-date published post is just our control post and our Notes on control evaluations for safety cases which is obviously incomplete.
I’m planing on posting a link to our best draft of a ready-to-go-ish plan as of 1 year ago, though it is quite out of date and incomplete.
I posted the link here.
Here is the doc, though note that it is very out of date. I don’t particularly want to recommend people read this doc, but it is possible that someone will find it valuable to read.
I don’t think funders are in a good position to do this. Also, funders are generally not “coherant”. Like they don’t have much top down strategy. Individual granters could write up thoughts.