Bogdan Ionut Cirstea comments on Bogdan Ionut Cirstea’s Shortform

Bogdan Ionut Cirstea 27 Feb 2024 18:26 UTC
2 points
0
I’m not aware of anybody currently working on coming up with concrete automated AI safety R&D evals, while there seems to be so much work going into e.g. DC evals or even (more recently) scheminess evals. This seems very suboptimal in terms of portfolio allocation.
- ryan_greenblatt 27 Feb 2024 18:53 UTC
  5 points
  1
  Parent
  Edit: oops I read this as “automated AI capabilies R&D”.
  
  METR and UK AISI are both interested in this. I think UK AISI is working on this directly while METR is working on this indirectly.
  
  See here.
  - Bogdan Ionut Cirstea 27 Feb 2024 19:01 UTC
    1 point
    0
    Parent
    Thanks! AFAICT though, the link you posted seems about automated AI capabilities R&D evals, rather than about automated AI safety / alignment R&D evals (I do expect transfer between the two, but they don’t seem like the same thing). I’ve also chatted to some people from both METR and UK AISI and got the impression from all of them that there’s some focus on automated AI capabilities R&D evals, but not on safety.
    - ryan_greenblatt 27 Feb 2024 19:49 UTC
      3 points
      1
      Parent
      Oops, misread you.
      
      I think some people at superalignment (OpenAI) are interested in some version of this and might already be working on this.
- faul_sname 27 Feb 2024 19:16 UTC
  3 points
  0
  Parent
  Can you give a concrete example of a safety property of the sort that are you envisioning automated testing for? Or am I misunderstanding what you’re hoping to see?