stuhlmueller comments on Elicit: Language Models as Research Assistants

stuhlmueller 20 Apr 2022 10:11 UTC
LW: 4 AF: 2
AF
Yeah, getting good at faithfulness is still an open problem. So far, we’ve mostly relied on imitative finetuning. to get misrepresentations down to about 10% (which is obviously still unacceptable). Going forward, I think that some combination of the following techniques will be needed to get performance to a reasonable level:
- Finetuning + RL from human preferences
- Adversarial data generation for finetuning + RL
- Verifier models, relying on evaluation being easier than generation
- Decomposition of verification, generating and testing ways that a claim could be wrong
- Debate (“self-criticism”)
- User feedback, highlighting situations where the model is wrong
- Tracking supporting information for each statement and through each chain of reasoning
- Voting among models trained/finetuned on different datasets
Thanks for the pointer to Pagnoni et al.