LLMs can sometimes spot some inconsistencies in their own outputs—for example, here I ask ChatGPT to produce a list of three notable individuals that share a birth date and year, and here I ask it to judge the correctness of the response to that question, and it is able to tell that the response was inaccurate.
It’s certainly not perfect or foolproof, but it’s not something they’re strictly incapable of either.
Although in fairness you would not be wrong if you said “LLMs can sometimes spot human-obvious inconsistencies in their outputs, but also things are currently moving very quickly”.
LLMs can sometimes spot some inconsistencies in their own outputs—for example, here I ask ChatGPT to produce a list of three notable individuals that share a birth date and year, and here I ask it to judge the correctness of the response to that question, and it is able to tell that the response was inaccurate.
It’s certainly not perfect or foolproof, but it’s not something they’re strictly incapable of either.
Although in fairness you would not be wrong if you said “LLMs can sometimes spot human-obvious inconsistencies in their outputs, but also things are currently moving very quickly”.