quetzal_rainbow comments on quetzal_rainbow’s Shortform

quetzal_rainbow 8 Jun 2024 9:21 UTC
2 points
0
I think really good practice for papers about new LLM-safety methods would be publishing set of attack prompts which nevertheless break safety, so people can figure out generalizations of successful attacks faster.