For example a human can to an extent inspect what they are going to say before they say or write it. Before saying Gary Marcus was “inspired by his pet chicken, Henrietta” a human may temporarily store the next words they plan to say elsewhere in the brain, and evaluate it.
Transformer-based also internally represent the tokens they are likely to emit in future steps. Demonstrated rigorously in Future Lens: Anticipating Subsequent Tokens from a Single Hidden State, though perhaps the simpler demonstration is simply that LLMs can reliably complete the sentence “Alice likes apples, Bob likes bananas, and Aaron likes apricots, so when I went to the store I bought Alice an apple and I got [Bob/Aaron]” with the appropriate “a/an” token.
So yes but actually no. What’s happening in the example you gave is the most probable token at each evaluation makes forward progress towards completing the sentence.
Suppose the prompt contained the constraint “the third word of the response must begin with the letter ‘c’.
And the model has already generated “Alice likes apples”.
The current models can be prompted to check all the constraints, and will often notice an error, but have no private buffer to try various generations until one that satisfies the prompt gets generated. Humans have a private buffer and can also write things down they don’t share. (Imagine solving this as a human. You would stop on word 3 and start brainstorming ‘c’ words and wouldn’t continue until you have a completion)
There’s a bunch of errors like this I hit with gpt-4.
Similarly if the probability of a correct generation is very low (“apples” may be far more probable even with the constraint for the letter ‘c’ in the prompt), current models are unable to online learn from their mistakes for common questions they get wrong. This makes them not very useful as “employees” for a specific role yet because they endlessly make the same errors.
Transformer-based also internally represent the tokens they are likely to emit in future steps. Demonstrated rigorously in Future Lens: Anticipating Subsequent Tokens from a Single Hidden State, though perhaps the simpler demonstration is simply that LLMs can reliably complete the sentence “Alice likes apples, Bob likes bananas, and Aaron likes apricots, so when I went to the store I bought Alice an apple and I got [Bob/Aaron]” with the appropriate “a/an” token.
So yes but actually no. What’s happening in the example you gave is the most probable token at each evaluation makes forward progress towards completing the sentence.
Suppose the prompt contained the constraint “the third word of the response must begin with the letter ‘c’.
And the model has already generated “Alice likes apples”.
The current models can be prompted to check all the constraints, and will often notice an error, but have no private buffer to try various generations until one that satisfies the prompt gets generated. Humans have a private buffer and can also write things down they don’t share. (Imagine solving this as a human. You would stop on word 3 and start brainstorming ‘c’ words and wouldn’t continue until you have a completion)
There’s a bunch of errors like this I hit with gpt-4.
Similarly if the probability of a correct generation is very low (“apples” may be far more probable even with the constraint for the letter ‘c’ in the prompt), current models are unable to online learn from their mistakes for common questions they get wrong. This makes them not very useful as “employees” for a specific role yet because they endlessly make the same errors.