Owain_Evans comments on Paper: Teaching GPT3 to express uncertainty in words

Owain_Evans 1 Jun 2022 9:42 UTC
3 points
The indirect logit is trained with cross-entropy based on the groundtruth correct answer. You can’t do this for verbalized probability without using RL, and so we instead do supervised learning using the empirical accuracy for different question types as the labels.