I think this broadly makes sense to me. There are many cases where “the model is pretending to be dumb” feels appropriate.
This is part of why building evaluations and benchmarks for this sort of thing is difficult.
I’m at least somewhat optimistic about doing things like data-prefixing to allow for controls over things like “play dumb for the joke” vs “give the best answer”, using techniques that build on human feedback.
I personally have totally seen GPT-3 fail to give a really good answer on a bunch of tries a bunch of times, but I spend a lot of time looking at it’s outputs and analyzing them. It seems important to be wary of the “seems to be dumb” failure modes.
I think this broadly makes sense to me. There are many cases where “the model is pretending to be dumb” feels appropriate.
This is part of why building evaluations and benchmarks for this sort of thing is difficult.
I’m at least somewhat optimistic about doing things like data-prefixing to allow for controls over things like “play dumb for the joke” vs “give the best answer”, using techniques that build on human feedback.
I personally have totally seen GPT-3 fail to give a really good answer on a bunch of tries a bunch of times, but I spend a lot of time looking at it’s outputs and analyzing them. It seems important to be wary of the “seems to be dumb” failure modes.