Ansh Radhakrishnan comments on Measuring and Improving the Faithfulness of Model-Generated Reasoning

Ansh Radhakrishnan 19 Jul 2023 18:53 UTC
LW: 4 AF: 3
1
AF
Honestly, I don’t think we have any very compelling ones! We gesture at some possibilities in the paper, such as it being harder for the model to ignore its reasoning when it’s in an explicit question-and-answer format (as opposed to a more free-form CoT), but I don’t think we have a good understanding of why it helps.

It’s also worth noting that CoT decomposition helps mitigate the ignored reasoning problem, but actually is more susceptible to biasing features in the context than CoT. Depending on how you weigh the two, it’s possible that CoT might still come out ahead on reasoning faithfulness (we chose to weigh the two equally).