GPT-3 can generate a plan and then a way to implement it: bold is prompt.
“Below is a plan of preparing a dinner.
1) Chop up some vegetables and put them in the fridge for later use.
2) Cook some meat, then you can eat it tonight!
3) Wash your hands, because this is going to be messy!
4) And lastly...
5) Eat!
Now you start doing it:
You chop up some carrots, onions and potatoes. You cook some beef, then you can have dinner tonight!
After eating, you wash your hands and get ready for bed. You don’t know how long it will take before you feel sleepy again so you decide to sleep early.
The next day is Sunday and as usual there isn’t much going on at school since everyone has been off all week (or at least most of them).”
That’s not what I mean by planning. I mean “outputting a particular word now because most alternatives would get you stuck later”.
An example is rhyming poetry. GPT-3 has learned to maintain the rhythm and the topic, and to end lines with rhyme-able words. But then as it approaches the end of the next line, it’s painted itself into a corner- there very rarely exists a word that completes the meter of the line, makes sense conceptually and grammatically, and rhymes exactly or approximately with the relevant previous line.
When people are writing rhyming metered poetry, we do it by having some idea where we’re going—setting ourselves up for the rhyme in advance. It seems that GPT-3 isn’t doing this.
...but then again, if it’s rewarded only for predictions one word at a time, why should it learn to do this? And could it learn the right pattern if given a cost function on the right kind of time horizon?
As for why your example isn’t what I’m talking about, there’s no point at which it needs to think about later words in order to write the earlier words.
I don’t believe rhymes are an example of a failure to plan. They are a clearcut case of BPE problems.
They follow the same patterns as other BPE problems: works on the most common (memorized) instances, rapidly degrading with rarity, the relevant information cannot be correctly represented by BPEs, they are inherently simple yet GPT-3 performs really badly despite human-like performance on almost identical tasks (like non-rhyming poetry, or non-pun based humor), and have improved minimally over GPT-2. With rhymes, it’s even more clearly not a planning problem because Peter Vessenes, I think, on the Slack set up a demo problem where the task was merely to select the rhyming word for a target word out of a prespecified list of possible rhymes; in line with BPEs, GPT-3 could correctly select short common rhyme pairs, and then fell apart as soon as you used rarer words. Similarly, I found little gain for prespecified rhymes. The problem is not that GPT-3 can’t plan good rhymes, the problem is that GPT-3 doesn’t know what words rhyme, period.
As far as planning goes, next-token prediction is entirely consistent with implicit planning. During each forward pass, GPT-3 probably has plenty of slack computation going on as tokens will differ widely in their difficulty while GPT-3′s feedforward remains a fixed-size computation; just as GPT-3 is always asking itself what sort of writer wrote the current text, so it can better imitate the language, style, format, structure, knowledge limitations or preferences* and even typos, it can ask what the human author is planning, the better to predict the next token. That it may be operating on its own past completions and there is no actual human author is irrelevant—because pretending really well to be an author who is planning equals being an author who is planning! (Watching how far GPT-3 can push this ‘as if’ imitation process is why I’ve begun thinking about mesa-optimizers and what ‘sufficiently advanced imitation’ may mean in terms of malevolent sub-agents created by the meta-learning outer agent.)
* Matt Brockman on the API Slack enjoys experimenting with prompts like “The AI hates cheese and will never talk about cheese” and seeing if he can get the “AI” in the dialogue to talk about cheese anyway.
Funny thing about BPEs: GPT-3 has to know about the individual letters, because I taught it how to spell both real words and nonsense words. (Prompts in bold, including two where I edited GPT-3′s responses.)
The students were ready to test their spelling. The teacher stood at the front of the class. “Adam, please spell PARTY.” Adam replied, “Party. P-A-R-T-Y.” The teacher said, “Correct. Beatrice, please spell THROUGH.” Beatrice replied, “Through. T-H-O-R-O-U-G-H.” The teacher said, “Incorrect. Through is spelled T-H-R-O-U-G-H. Carlos, please spell SPELL.” Carlos replied, “Spell. S-P-E-L-L.” The teacher said, “Correct. Daphne, please spell NUCLEAR.” Daphne replied, “Nuclear. N-U-C-L-E-A-R.” The teacher said, “Correct. Adam, please spell INFINITE.” Adam replied, “Infinite. I-N-F-I-N-A-T-E.” The teacher replied, “Incorrect. Infinite is spelled I-N-F-I-N-I-T-E. Beatrice, please spell BALLOON.” Beatrice replied, “Balloon. B-A-L-L-O-O-N.” The teacher replied, “Correct. Carlos, please spell ENCLOSURE.” Carlos replied, “Enclosure. I-N-C-L-O-S-U-R-E.” The teacher replied, “Incorrect. Enclosure is spelled E-N-C-L-O-S-U-R-E. Daphne, please spell ELECTRON.” Daphne replied, “Electron. E-L-E-C-T-R-O-N.” The teacher replied, “Correct. Adam, please spell CAT.” Adam replied, “Cat. C-A-T.” The teacher said, “Correct. Carlos, please spell MACHINE.” Carlos replied, “Machine. M-A-C.” The teacher said, “Incorrect. Machine is spelled M-A-C-H-I-N-E. Beatrice, please spell SCIENTIST.” Beatrice replied, “Scientist. S-C-I-E-N-T-I-S-T.” The teacher said, “Correct. Daphne, please spell ASTRONOMER. Daphne replied, “Astronomer. A-S-T-R-O-N-O-M-E-R.” The teacher said, “Correct. Adam, please spell UNIVERSE. Adam replied, “Universe. U-N-I-V-E-R-S-E.” The teacher said, “Correct. Carlos, please spell FLARBLE. Carlos replied, “Flarble. F-L-A-R-B-L-E.
Sure. It’s seen plenty of individual letters (letters have their own BPEs as fallbacks if longer BPEs don’t capture them, AFAIK). Stuff like my acrostics demonstration relies on the fact that GPT-3 has knowledge of letters and can, with some difficulty, manipulate them for various tasks.
(Reply to gwern’s comment but not only addressing gwern.)
Concerning the planning question:
I agree that next-token prediction is consistent with some sort of implicit planning of multiple tokens ahead. I would phrase it a bit differently. Also, “implicit” is doing lot of work here
(Please someone correct me if I say something obviously wrong or silly; I do not know how GPT-3 works, but I will try to say something about how it works after reading some sources [1].)
The bigger point about planning, though, is that the GPTs are getting feedback on one word at a time in isolation. It’s hard for them to learn not to paint themselves into a corner.
To recap what I have thus far got from [1]: GPT-3-like transformers are trained by regimen where the loss function evaluates prediction error of the next word in the sequence given the previous word. However, I am less sure if one can say they do it in isolation. During training (by SGD I figure?), transformer decoder layers have (i) access to previous words in the sequence, and (ii) both attention and feedforward parts of each transformer layer has weights (that are being trained) to compute the output predictions. Also, (iii) the GPT transformer architecture considers all words in each training sequence, left to right, masking the future. And this is done for many meaningful Common Crawl sequences, though exact same sequences won’t repeat.
So, it sounds a bit trivial that GPTs trained weights allow “implicit planning”: if given a sequence of words w_1 to w_i-1 GPT would output word w for position i, this is because a trained GPT model (loosely speaking, abstracting away many details I don’t understand) “dynamically encodes” many plausible “word paths” to word w, and [w_1 … w_i-1] is such a path; by iteration, it also encodes many word paths from w to other words w’, where some words are likelier to follow w than others. The representations in the stack of attention and feedforward layers allows it to generate text much more better than eg old good Markov chain. And “self-attending” to some higher-level representation that allows it generate text in particular prose style seems a lot of like a kind of plan. And GPT generating text that it used as input to it, to which it again can selectively “attend to”, this all seems like as a kind of working memory, which will trigger self-attention mechanism to take certain paths, and so on.
I also want highlight oceainthemiddleofanisland’s comment in other thread: Breaking complicated generation tasks into smaller chunks getting GPT to output intermediate text from initial input, which is then given as input to GPT to reprocess, enabling it finally to output desired output, sounds quite compatible to this view.
(On this note, I am not sure what to think of the role of human in the loop here, or in general, how it apparently requires non-trivial work to find a “working” prompt that seeds GPT obtain desired results for some particularly difficult tasks. That there are useful, rich world models “in there somewhere” in GPTs weights, but it is difficult to activate them? And are these difficulties because it is humans are bad at prompting GPT to generate text that accesses the good models, or because GPTs all-together model is not always so impressive as it easily turns into building answers based on gibberish models instead of the good ones, or maybe GPT having a bad internal model of humans attempting to use GPT? Gwern’s example concerning bear attacks was interesting here.)
This would be “implicit planning”. Is it “planning” enough? In any case, the discussion would be easier if we had a clearer definition what would constitute planning and what would not.
Finally, a specific response to gwerns comment.
During each forward pass, GPT-3 probably has plenty of slack computation going on as tokens will differ widely in their difficulty while GPT-3′s feedforward remains a fixed-size computation; just as GPT-3 is always asking itself what sort of writer wrote the current text, so it can better imitate the language, style, format, structure, knowledge limitations or preferences* and even typos, it can ask what the human author is planning, the better to predict the next token. That it may be operating on its own past completions and there is no actual human author is irrelevant—because pretending really well to be an author who is planning equals being an author who is planning! (Watching how far GPT-3 can push this ‘as if’ imitation process is why I’ve begun thinking about mesa-optimizers and what ‘sufficiently advanced imitation’ may mean in terms of malevolent sub-agents created by the meta-learning outer agent.)
Using language how GPT-3 is “pretending” and “asking itself what a human author would do” can be maybe justified as metaphors, but I think it is a bit fuzzy and may obscure differences between what transformers do when we say they “plan” or “pretend”, and what people would assume of beings who “plan” or “pretend”. For example, using a word like “pretend” easily carries over an implication that there is something true, hidden, “unpretense” thinking or personality going on underneath. This appears quite unlikely given a fixed model, and generation mechanism that starts anew from each seed prompt. I would rather say that GPT has a model (is a model?) that is surprisingly good at natural language extrapolation and also, it is surprising at what can be achieved by extrapolation.
Yes, I understand that it doesn’t actually plan things, but we can make it mimic planing via special prompts, the same way as GPT mimics reasoning and other things.
I contend it is not an *implementation* in a meaningful sense of the word. It is more a prose elaboration / expansion of the first generated bullet point list (an inaccurate one: “plan” mentions chopping vegetables, putting them in a fridge and cooking meat; prose version tells of chopping a set of vegetables, skips the fridge and then cooks beef, and then tells an irrelevant story where you go to sleep early and find it is a Sunday and no school).
Mind, substituting abstract category words with sensible more specific ones (vegetables → carrots, onions and potatoes) is an impressive NLP task for an architecture where the behavior is not hard-coded in (because that’s how some previous natural language generators worked), and even more impressive that it can produce the said expansion with a NLP input prompt, but hardly a useful implementation of a plan.
An improved experiment of “implementing plans” that could be within capabilities of GPT-3 or similar system: get GPT-3 to first output a plan of doing $a_thing and then the correct keystroke sequence input for UnReal World, DwarfFortress or Sims or some other similar simulated environment to produce it.
GPT-3 can generate a plan and then a way to implement it: bold is prompt.
“Below is a plan of preparing a dinner.
1) Chop up some vegetables and put them in the fridge for later use.
2) Cook some meat, then you can eat it tonight!
3) Wash your hands, because this is going to be messy!
4) And lastly...
5) Eat!
Now you start doing it:
You chop up some carrots, onions and potatoes. You cook some beef, then you can have dinner tonight!
After eating, you wash your hands and get ready for bed. You don’t know how long it will take before you feel sleepy again so you decide to sleep early.
The next day is Sunday and as usual there isn’t much going on at school since everyone has been off all week (or at least most of them).”
That’s not what I mean by planning. I mean “outputting a particular word now because most alternatives would get you stuck later”.
An example is rhyming poetry. GPT-3 has learned to maintain the rhythm and the topic, and to end lines with rhyme-able words. But then as it approaches the end of the next line, it’s painted itself into a corner- there very rarely exists a word that completes the meter of the line, makes sense conceptually and grammatically, and rhymes exactly or approximately with the relevant previous line.
When people are writing rhyming metered poetry, we do it by having some idea where we’re going—setting ourselves up for the rhyme in advance. It seems that GPT-3 isn’t doing this.
...but then again, if it’s rewarded only for predictions one word at a time, why should it learn to do this? And could it learn the right pattern if given a cost function on the right kind of time horizon?
As for why your example isn’t what I’m talking about, there’s no point at which it needs to think about later words in order to write the earlier words.
I don’t believe rhymes are an example of a failure to plan. They are a clearcut case of BPE problems.
They follow the same patterns as other BPE problems: works on the most common (memorized) instances, rapidly degrading with rarity, the relevant information cannot be correctly represented by BPEs, they are inherently simple yet GPT-3 performs really badly despite human-like performance on almost identical tasks (like non-rhyming poetry, or non-pun based humor), and have improved minimally over GPT-2. With rhymes, it’s even more clearly not a planning problem because Peter Vessenes, I think, on the Slack set up a demo problem where the task was merely to select the rhyming word for a target word out of a prespecified list of possible rhymes; in line with BPEs, GPT-3 could correctly select short common rhyme pairs, and then fell apart as soon as you used rarer words. Similarly, I found little gain for prespecified rhymes. The problem is not that GPT-3 can’t plan good rhymes, the problem is that GPT-3 doesn’t know what words rhyme, period.
As far as planning goes, next-token prediction is entirely consistent with implicit planning. During each forward pass, GPT-3 probably has plenty of slack computation going on as tokens will differ widely in their difficulty while GPT-3′s feedforward remains a fixed-size computation; just as GPT-3 is always asking itself what sort of writer wrote the current text, so it can better imitate the language, style, format, structure, knowledge limitations or preferences* and even typos, it can ask what the human author is planning, the better to predict the next token. That it may be operating on its own past completions and there is no actual human author is irrelevant—because pretending really well to be an author who is planning equals being an author who is planning! (Watching how far GPT-3 can push this ‘as if’ imitation process is why I’ve begun thinking about mesa-optimizers and what ‘sufficiently advanced imitation’ may mean in terms of malevolent sub-agents created by the meta-learning outer agent.)
* Matt Brockman on the API Slack enjoys experimenting with prompts like “The AI hates cheese and will never talk about cheese” and seeing if he can get the “AI” in the dialogue to talk about cheese anyway.
Funny thing about BPEs: GPT-3 has to know about the individual letters, because I taught it how to spell both real words and nonsense words. (Prompts in bold, including two where I edited GPT-3′s responses.)
The students were ready to test their spelling.
The teacher stood at the front of the class. “Adam, please spell PARTY.”
Adam replied, “Party. P-A-R-T-Y.”
The teacher said, “Correct. Beatrice, please spell THROUGH.”
Beatrice replied, “Through. T-H-O-R-O-U-G-H.”
The teacher said, “Incorrect. Through is spelled T-H-R-O-U-G-H. Carlos, please spell SPELL.”
Carlos replied, “Spell. S-P-E-L-L.”
The teacher said, “Correct. Daphne, please spell NUCLEAR.”
Daphne replied, “Nuclear. N-U-C-L-E-A-R.”
The teacher said, “Correct. Adam, please spell INFINITE.”
Adam replied, “Infinite. I-N-F-I-N-A-T-E.”
The teacher replied, “Incorrect. Infinite is spelled I-N-F-I-N-I-T-E. Beatrice, please spell BALLOON.”
Beatrice replied, “Balloon. B-A-L-L-O-O-N.”
The teacher replied, “Correct. Carlos, please spell ENCLOSURE.”
Carlos replied, “Enclosure. I-N-C-L-O-S-U-R-E.”
The teacher replied, “Incorrect. Enclosure is spelled E-N-C-L-O-S-U-R-E. Daphne, please spell ELECTRON.”
Daphne replied, “Electron. E-L-E-C-T-R-O-N.”
The teacher replied, “Correct. Adam, please spell CAT.”
Adam replied, “Cat. C-A-T.”
The teacher said, “Correct. Carlos, please spell MACHINE.”
Carlos replied, “Machine. M-A-C.”
The teacher said, “Incorrect. Machine is spelled M-A-C-H-I-N-E. Beatrice, please spell SCIENTIST.”
Beatrice replied, “Scientist. S-C-I-E-N-T-I-S-T.”
The teacher said, “Correct. Daphne, please spell ASTRONOMER.
Daphne replied, “Astronomer. A-S-T-R-O-N-O-M-E-R.”
The teacher said, “Correct. Adam, please spell UNIVERSE.
Adam replied, “Universe. U-N-I-V-E-R-S-E.”
The teacher said, “Correct. Carlos, please spell FLARBLE.
Carlos replied, “Flarble. F-L-A-R-B-L-E.
----------------
You’ve done much more advanced stuff, of course.
Sure. It’s seen plenty of individual letters (letters have their own BPEs as fallbacks if longer BPEs don’t capture them, AFAIK). Stuff like my acrostics demonstration relies on the fact that GPT-3 has knowledge of letters and can, with some difficulty, manipulate them for various tasks.
(Reply to gwern’s comment but not only addressing gwern.)
Concerning the planning question:
I agree that next-token prediction is consistent with some sort of implicit planning of multiple tokens ahead. I would phrase it a bit differently. Also, “implicit” is doing lot of work here
(Please someone correct me if I say something obviously wrong or silly; I do not know how GPT-3 works, but I will try to say something about how it works after reading some sources [1].)
To recap what I have thus far got from [1]: GPT-3-like transformers are trained by regimen where the loss function evaluates prediction error of the next word in the sequence given the previous word. However, I am less sure if one can say they do it in isolation. During training (by SGD I figure?), transformer decoder layers have (i) access to previous words in the sequence, and (ii) both attention and feedforward parts of each transformer layer has weights (that are being trained) to compute the output predictions. Also, (iii) the GPT transformer architecture considers all words in each training sequence, left to right, masking the future. And this is done for many meaningful Common Crawl sequences, though exact same sequences won’t repeat.
So, it sounds a bit trivial that GPTs trained weights allow “implicit planning”: if given a sequence of words w_1 to w_i-1 GPT would output word w for position i, this is because a trained GPT model (loosely speaking, abstracting away many details I don’t understand) “dynamically encodes” many plausible “word paths” to word w, and [w_1 … w_i-1] is such a path; by iteration, it also encodes many word paths from w to other words w’, where some words are likelier to follow w than others. The representations in the stack of attention and feedforward layers allows it to generate text much more better than eg old good Markov chain. And “self-attending” to some higher-level representation that allows it generate text in particular prose style seems a lot of like a kind of plan. And GPT generating text that it used as input to it, to which it again can selectively “attend to”, this all seems like as a kind of working memory, which will trigger self-attention mechanism to take certain paths, and so on.
I also want highlight oceainthemiddleofanisland’s comment in other thread: Breaking complicated generation tasks into smaller chunks getting GPT to output intermediate text from initial input, which is then given as input to GPT to reprocess, enabling it finally to output desired output, sounds quite compatible to this view.
(On this note, I am not sure what to think of the role of human in the loop here, or in general, how it apparently requires non-trivial work to find a “working” prompt that seeds GPT obtain desired results for some particularly difficult tasks. That there are useful, rich world models “in there somewhere” in GPTs weights, but it is difficult to activate them? And are these difficulties because it is humans are bad at prompting GPT to generate text that accesses the good models, or because GPTs all-together model is not always so impressive as it easily turns into building answers based on gibberish models instead of the good ones, or maybe GPT having a bad internal model of humans attempting to use GPT? Gwern’s example concerning bear attacks was interesting here.)
This would be “implicit planning”. Is it “planning” enough? In any case, the discussion would be easier if we had a clearer definition what would constitute planning and what would not.
Finally, a specific response to gwerns comment.
Using language how GPT-3 is “pretending” and “asking itself what a human author would do” can be maybe justified as metaphors, but I think it is a bit fuzzy and may obscure differences between what transformers do when we say they “plan” or “pretend”, and what people would assume of beings who “plan” or “pretend”. For example, using a word like “pretend” easily carries over an implication that there is something true, hidden, “unpretense” thinking or personality going on underneath. This appears quite unlikely given a fixed model, and generation mechanism that starts anew from each seed prompt. I would rather say that GPT has a model (is a model?) that is surprisingly good at natural language extrapolation and also, it is surprising at what can be achieved by extrapolation.
[1] http://jalammar.github.io/illustrated-gpt2/ , http://peterbloem.nl/blog/transformers and https://amaarora.github.io/2020/02/18/annotatedGPT2.html in addition to skimming original OpenAI papers
Yes, I understand that it doesn’t actually plan things, but we can make it mimic planing via special prompts, the same way as GPT mimics reasoning and other things.
I contend it is not an *implementation* in a meaningful sense of the word. It is more a prose elaboration / expansion of the first generated bullet point list (an inaccurate one: “plan” mentions chopping vegetables, putting them in a fridge and cooking meat; prose version tells of chopping a set of vegetables, skips the fridge and then cooks beef, and then tells an irrelevant story where you go to sleep early and find it is a Sunday and no school).
Mind, substituting abstract category words with sensible more specific ones (vegetables → carrots, onions and potatoes) is an impressive NLP task for an architecture where the behavior is not hard-coded in (because that’s how some previous natural language generators worked), and even more impressive that it can produce the said expansion with a NLP input prompt, but hardly a useful implementation of a plan.
An improved experiment of “implementing plans” that could be within capabilities of GPT-3 or similar system: get GPT-3 to first output a plan of doing $a_thing and then the correct keystroke sequence input for UnReal World, DwarfFortress or Sims or some other similar simulated environment to produce it.