I also took into account that refusal-vector ablated models are available on huggingface and scaffolding, this post might still give it more exposure though. Also Llama 3 70B performs many unethical tasks without any attempt at circumventing safety. At that point I am really just applying a scaffolding. Do you think it is wrong to report on this?
How could this go wrong, people realize how powerful this is and invest more time and resources into developing their own versions?
I don’t really think of this as alignment research, just want to show people how far along we are. Positive impact could be to prepare people for these agents going around, agents being used for demos. Also potentially convince labs to be more careful in their releases.
I also took into account that refusal-vector ablated models are available on huggingface and scaffolding, this post might still give it more exposure though.
Also Llama 3 70B performs many unethical tasks without any attempt at circumventing safety. At that point I am really just applying a scaffolding. Do you think it is wrong to report on this?
How could this go wrong, people realize how powerful this is and invest more time and resources into developing their own versions?
I don’t really think of this as alignment research, just want to show people how far along we are. Positive impact could be to prepare people for these agents going around, agents being used for demos. Also potentially convince labs to be more careful in their releases.