supervised finetuning on adversarial examples that trigger the B-bias.
You’re correct though my thought was as a general policy, to have something check every answer for any kind of incorrectness or bias. For this situation, assuming you don’t have a database of ground truth (trivial case if you do) you need some method to get the most likely ground truth. You could ask multiple larger LLMs and train this one on the ‘more correct’ opinion from it’s peers.
But that may not be worth doing. I was thinking more of factual questions, legal briefs that reference a case number, claims about a website, and so on. Each of these has a ground truth it is possible to look up, and you need to fine tune the model automatically to produce logits with very high probability of the correct answer.
Also in this general case, the logits are useless, because you aren’t looking for the token for the answers A/B/P/N but a series of steps that lead to an answer, and you want both the reasoning in the steps to be valid, and the answer to be correct. The 1 letter answer is a trivial case.
So I found llms change wrong answers pretty often if the llm is gpt-4. Have not tried this one.
But these challenges aren’t insurmountable, so I wonder why I haven’t seen finetuned “judges” more often
What also are the other costs besides fine-tuning in terms of model performance on other tasks? 7B doesn’t leave much usable space, everything it learns comes at a cost.
You’re correct though my thought was as a general policy, to have something check every answer for any kind of incorrectness or bias. For this situation, assuming you don’t have a database of ground truth (trivial case if you do) you need some method to get the most likely ground truth. You could ask multiple larger LLMs and train this one on the ‘more correct’ opinion from it’s peers.
But that may not be worth doing. I was thinking more of factual questions, legal briefs that reference a case number, claims about a website, and so on. Each of these has a ground truth it is possible to look up, and you need to fine tune the model automatically to produce logits with very high probability of the correct answer.
Also in this general case, the logits are useless, because you aren’t looking for the token for the answers A/B/P/N but a series of steps that lead to an answer, and you want both the reasoning in the steps to be valid, and the answer to be correct. The 1 letter answer is a trivial case.
So I found llms change wrong answers pretty often if the llm is gpt-4. Have not tried this one.
What also are the other costs besides fine-tuning in terms of model performance on other tasks? 7B doesn’t leave much usable space, everything it learns comes at a cost.