Agreed. A related thought is that we might only need to be able to interpret a single model at a particular capability level to unlock the safety benefits, as long as we can make a sufficient case that we should use that model. We don’t care inherently about interpreting GPT-4, we care about there existing a GPT-4 level model that we can interpret.
Agreed. A related thought is that we might only need to be able to interpret a single model at a particular capability level to unlock the safety benefits, as long as we can make a sufficient case that we should use that model. We don’t care inherently about interpreting GPT-4, we care about there existing a GPT-4 level model that we can interpret.