bilalchughtai comments on StefanHex’s Shortform

bilalchughtai 19 Nov 2024 23:33 UTC
3 points
2
Agreed. A related thought is that we might only need to be able to interpret a single model at a particular capability level to unlock the safety benefits, as long as we can make a sufficient case that we should use that model. We don’t care inherently about interpreting GPT-4, we care about there existing a GPT-4 level model that we can interpret.