I’m not aware of anybody currently working on coming up with concrete automated AI safety R&D evals, while there seems to be so much work going into e.g. DC evals or even (more recently) scheminess evals. This seems very suboptimal in terms of portfolio allocation.
Thanks! AFAICT though, the link you posted seems about automated AI capabilities R&D evals, rather than about automated AI safety / alignment R&D evals (I do expect transfer between the two, but they don’t seem like the same thing). I’ve also chatted to some people from both METR and UK AISI and got the impression from all of them that there’s some focus on automated AI capabilities R&D evals, but not on safety.
Can you give a concrete example of a safety property of the sort that are you envisioning automated testing for? Or am I misunderstanding what you’re hoping to see?
I’m not aware of anybody currently working on coming up with concrete automated AI safety R&D evals, while there seems to be so much work going into e.g. DC evals or even (more recently) scheminess evals. This seems very suboptimal in terms of portfolio allocation.
Edit: oops I read this as “automated AI capabilies R&D”.
METR and UK AISI are both interested in this. I think UK AISI is working on this directly while METR is working on this indirectly.
See here.
Thanks! AFAICT though, the link you posted seems about automated AI capabilities R&D evals, rather than about automated AI safety / alignment R&D evals (I do expect transfer between the two, but they don’t seem like the same thing). I’ve also chatted to some people from both METR and UK AISI and got the impression from all of them that there’s some focus on automated AI capabilities R&D evals, but not on safety.
Oops, misread you.
I think some people at superalignment (OpenAI) are interested in some version of this and might already be working on this.
Can you give a concrete example of a safety property of the sort that are you envisioning automated testing for? Or am I misunderstanding what you’re hoping to see?