I don’t think your truth machine would work because you misunderstand what makes LLMs hallucinate. Predicting what a maximum-knowledge author would write induces more hallucinations, not less. For example, say you prompted your LLM to predict text supposedly written by an omniscient oracle, and then asked “How many fingers am I holding behind my back?” The LLM would predict an answer like “three” or something, because an omniscient person would know that, even though it’s probably not true.
What would work even better would be for the system to simply be Writing instead of Predicting What Someone Wrote, but nobody’s done that yet. (because it’s hard)
I don’t think your truth machine would work because you misunderstand what makes LLMs hallucinate. Predicting what a maximum-knowledge author would write induces more hallucinations, not less. For example, say you prompted your LLM to predict text supposedly written by an omniscient oracle, and then asked “How many fingers am I holding behind my back?” The LLM would predict an answer like “three” or something, because an omniscient person would know that, even though it’s probably not true.
In other words, you’d want the system to believe “this writer I’m predicting knows exactly what I do, no more, no less”, not “this writer knows way more than me”. Read Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? for evidence of this.
What would work even better would be for the system to simply be Writing instead of Predicting What Someone Wrote, but nobody’s done that yet. (because it’s hard)