Fwiw I’m also skeptical of how much we can conclude from these evals, though I think they’re way above the bar for “worthwhile to report”.
Another threat model you could care about (within persuasion) is targeted recruitment for violent ideologies. With that one too it’s plausible you’d want a more targeted eval, though I think simplicity, generality, and low cost are also reasonable things to optimize for in evals.
Fwiw I’m also skeptical of how much we can conclude from these evals, though I think they’re way above the bar for “worthwhile to report”.
Another threat model you could care about (within persuasion) is targeted recruitment for violent ideologies. With that one too it’s plausible you’d want a more targeted eval, though I think simplicity, generality, and low cost are also reasonable things to optimize for in evals.
Yeah, maybe I’m pretty off base in what the meta-level policy should be like. I don’t feel very strongly about how to manage this.
I also now realized that some of the langauge was stronger than I think I intended and I’ve edited the original comment, sorry about that.