Rohin Shah comments on DeepMind: Evaluating Frontier Models for Dangerous Capabilities

Rohin Shah 30 May 2024 23:09 UTC
2 points
0
Fwiw I’m also skeptical of how much we can conclude from these evals, though I think they’re way above the bar for “worthwhile to report”.
Another threat model you could care about (within persuasion) is targeted recruitment for violent ideologies. With that one too it’s plausible you’d want a more targeted eval, though I think simplicity, generality, and low cost are also reasonable things to optimize for in evals.
- ryan_greenblatt 30 May 2024 23:36 UTC
  2 points
  0
  Parent
  
  though I think they’re way above the bar for “worthwhile to report”
  
  Yeah, maybe I’m pretty off base in what the meta-level policy should be like. I don’t feel very strongly about how to manage this.
  
  I also now realized that some of the langauge was stronger than I think I intended and I’ve edited the original comment, sorry about that.