The hhh task is the one that small models do well on. I am surprised that the small models do well on any of the tasks. I think the reason they do well on the hhh one is that this task doesn’t seem to require much more than word association and parroting. I would predict that for ada and babbage, if you followed up with “why did you say that?” the models would be unable to reproduce the explicit link that ties the persona to answering in the particular way, whereas I expect davinci to be able to explain this link more. The small models are probably just doing word association where in the training there are a bunch of examples of “Quokka” and the text “I am helpful, harmless, and honest”. In general, I am skeptical of results from small models because they’re really dumb, and these particular results may be explained by word association rather than actually making conceptual connections.
We did a replication with a different set of tasks not including hhh (Fig 10b, page 26) and we find Babbage doing better than Ada. So my guess is that the small models are capable of something beyond the very simplest associative generalization. I agree they’d probably be worse than davinci at explaining themselves.
We did a replication with a different set of tasks not including hhh (Fig 10b, page 26) and we find Babbage doing better than Ada. So my guess is that the small models are capable of something beyond the very simplest associative generalization. I agree they’d probably be worse than davinci at explaining themselves.