The article and my examples were meant to show that there is a gap between what GPT knows and what it says. It knows something, but sometimes says that it doesn’t, or it just makes it up. I haven’t addressed your “GPT generator/critic” framework or the calibration issues as I don’t really see them much relevant here. GPT is just GPT. Being a critic/verifier is basically always easier. IIRC the GPT-4 paper didn’t really go into much detail of how they tested the calibration, but that’s irrelevant here as I am claiming that sometimes it know the “right probability” but it generates a made up one.
I don’t see how “say true things when you are asked and you know the true thing” is such a high standard, just because we have already internalised that it’s ok that sometimes GPT says make up things
The article and my examples were meant to show that there is a gap between what GPT knows and what it says. It knows something, but sometimes says that it doesn’t, or it just makes it up. I haven’t addressed your “GPT generator/critic” framework or the calibration issues as I don’t really see them much relevant here. GPT is just GPT. Being a critic/verifier is basically always easier. IIRC the GPT-4 paper didn’t really go into much detail of how they tested the calibration, but that’s irrelevant here as I am claiming that sometimes it know the “right probability” but it generates a made up one.
I don’t see how “say true things when you are asked and you know the true thing” is such a high standard, just because we have already internalised that it’s ok that sometimes GPT says make up things