Thanks for the query! We don’t think you should keep misaligned AI around if you’ve got a provably aligned one to use instead. We’re worried about evals of misaligned AI, and specifically how one prompts the model that’s being testing, what context it’s tested in, and so on. We think that evals of misaligned AIs should be minimized, and one way to do that is to get the most information you can from prompting nice, friendly behavior, rather than prompting misaligned behaviour (e.g. the red-team tests).
Thanks for the query! We don’t think you should keep misaligned AI around if you’ve got a provably aligned one to use instead. We’re worried about evals of misaligned AI, and specifically how one prompts the model that’s being testing, what context it’s tested in, and so on. We think that evals of misaligned AIs should be minimized, and one way to do that is to get the most information you can from prompting nice, friendly behavior, rather than prompting misaligned behaviour (e.g. the red-team tests).