For those of you not already familiar with it, the paper TinyStories: How Small Can Language Models Be and Still Speak Coherent English? makes fascinating reading. (Sadly as far as I know no-one has reproduced this research for Chinese.) They experiment with training stacks of parrots extremely small transformer models only 1-to-8 parrots transformer blocks deep, with total parameter counts in the tens of millions (with an ‘m’), and show that even a single (21M parameter) stochastic parrot can speak fairly grammatical English (for a child-sized vocabulary) but keeps losing the thread of the story, while at just 2 or 4 parrots deep the models can tell only-slightly incoherent stories (with a level of non-sequiturs and plot holes roughly comparable to an actual two-or-three-year old making up a story). So on a suitably restricted vocabulary and format, and with a synthetic training set, you really don’t need anything like a stack of 30–100 parrots to do as well as a rather small human child: a handful will do.
For those of you not already familiar with it, the paper TinyStories: How Small Can Language Models Be and Still Speak Coherent English? makes fascinating reading. (Sadly as far as I know no-one has reproduced this research for Chinese.) They experiment with training
stacks of parrotsextremely small transformer models only 1-to-8parrotstransformer blocks deep, with total parameter counts in the tens of millions (with an ‘m’), and show that even a single (21M parameter) stochastic parrot can speak fairly grammatical English (for a child-sized vocabulary) but keeps losing the thread of the story, while at just 2 or 4 parrots deep the models can tell only-slightly incoherent stories (with a level of non-sequiturs and plot holes roughly comparable to an actual two-or-three-year old making up a story). So on a suitably restricted vocabulary and format, and with a synthetic training set, you really don’t need anything like a stack of 30–100 parrots to do as well as a rather small human child: a handful will do.