So yes but actually no. What’s happening in the example you gave is the most probable token at each evaluation makes forward progress towards completing the sentence.
Suppose the prompt contained the constraint “the third word of the response must begin with the letter ‘c’.
And the model has already generated “Alice likes apples”.
The current models can be prompted to check all the constraints, and will often notice an error, but have no private buffer to try various generations until one that satisfies the prompt gets generated. Humans have a private buffer and can also write things down they don’t share. (Imagine solving this as a human. You would stop on word 3 and start brainstorming ‘c’ words and wouldn’t continue until you have a completion)
There’s a bunch of errors like this I hit with gpt-4.
Similarly if the probability of a correct generation is very low (“apples” may be far more probable even with the constraint for the letter ‘c’ in the prompt), current models are unable to online learn from their mistakes for common questions they get wrong. This makes them not very useful as “employees” for a specific role yet because they endlessly make the same errors.
So yes but actually no. What’s happening in the example you gave is the most probable token at each evaluation makes forward progress towards completing the sentence.
Suppose the prompt contained the constraint “the third word of the response must begin with the letter ‘c’.
And the model has already generated “Alice likes apples”.
The current models can be prompted to check all the constraints, and will often notice an error, but have no private buffer to try various generations until one that satisfies the prompt gets generated. Humans have a private buffer and can also write things down they don’t share. (Imagine solving this as a human. You would stop on word 3 and start brainstorming ‘c’ words and wouldn’t continue until you have a completion)
There’s a bunch of errors like this I hit with gpt-4.
Similarly if the probability of a correct generation is very low (“apples” may be far more probable even with the constraint for the letter ‘c’ in the prompt), current models are unable to online learn from their mistakes for common questions they get wrong. This makes them not very useful as “employees” for a specific role yet because they endlessly make the same errors.