I’m pretty sure the GPT confabulation is (at least in part) caused by highly uncertain probability distribution collapse, where the uncertainty in the distribution is induced by the computational limits of the model.
Basically the model is asked to solve a problem it simply can’t (like, say, general case multiplication in one step), and no matter how many training iterations and training examples are run, it can’t actually learn to calculate the correct answer. The result is a relatively even distribution over the kinds of answers it typically saw associated with that type of problem. At inference time, there’s no standout answer, so you basically randomly sample from some common possibilities. The next iteration sees the nonsense as input and it’s locked in.
Unfortunately, raw capability gain seems sufficient to address that particular failure mode.
Hopefully we do actually live in that reality!
I’m pretty sure the GPT confabulation is (at least in part) caused by highly uncertain probability distribution collapse, where the uncertainty in the distribution is induced by the computational limits of the model.
Basically the model is asked to solve a problem it simply can’t (like, say, general case multiplication in one step), and no matter how many training iterations and training examples are run, it can’t actually learn to calculate the correct answer. The result is a relatively even distribution over the kinds of answers it typically saw associated with that type of problem. At inference time, there’s no standout answer, so you basically randomly sample from some common possibilities. The next iteration sees the nonsense as input and it’s locked in.
Unfortunately, raw capability gain seems sufficient to address that particular failure mode.