Yes, I don’t think they’ll “wake up on their own”, we would need to do something for that.
But they are not active on their own anyway, they only start computing when one prompts them; so whatever abilities they do exhibit become apparent via our various interactions with them.
In some sense, the ability to do rapid in-context learning is their main emergent ability (although, we have recently seen some indications that scale might not be required for that at all, that with a more clever architecture and training, one can obtain these capabilities in radically smaller models, see e.g. Uncovering mesa-optimization algorithms in Transformers which seems to point in that direction rather strongly).
Yes, I completely agree with you that in-context learning (ICL) is the only new “ability” LLMs seem to be displaying. I also agree with you that they start computing only when we prompt.
Just so we can explicitly say that this isn’t possible, I’d not call ICL an “emergent ability” in the Wei et al. sense. ICL “expressiveness” seems to increase with scale so it’s predictable (and so does not imply other “unknowable” capabilities emerging with scale such as, deception, planning, …)!
It’s going to be really exciting if we are able to obtain ICL at smaller scale! Thank you very much for that link. That’s a very interesting paper!
It’s very interesting that even quite recently ICL was considered to be “an impossible Holy Grail goal”, with the assumption that models always need to see many examples to learn anything new (and so they are inherently inferior to biological learners in this sense).
And now we have this strong ICL ability and this gap with biological learners is gone...
Yes, thanks a lot!
Yes, I don’t think they’ll “wake up on their own”, we would need to do something for that.
But they are not active on their own anyway, they only start computing when one prompts them; so whatever abilities they do exhibit become apparent via our various interactions with them.
In some sense, the ability to do rapid in-context learning is their main emergent ability (although, we have recently seen some indications that scale might not be required for that at all, that with a more clever architecture and training, one can obtain these capabilities in radically smaller models, see e.g. Uncovering mesa-optimization algorithms in Transformers which seems to point in that direction rather strongly).
Thanks!
Yes, I completely agree with you that in-context learning (ICL) is the only new “ability” LLMs seem to be displaying. I also agree with you that they start computing only when we prompt.
There seems to be the impression that, when prompted, LLMS might do something different (or even orthogonal) to what the user requests (see, for example, Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure, report here by the BBC). We’d probably agree that this was careful prompt engineering (made possible by ICL) and not an active attempt by GPT to “deceive”.
Just so we can explicitly say that this isn’t possible, I’d not call ICL an “emergent ability” in the Wei et al. sense. ICL “expressiveness” seems to increase with scale so it’s predictable (and so does not imply other “unknowable” capabilities emerging with scale such as, deception, planning, …)!
It’s going to be really exciting if we are able to obtain ICL at smaller scale! Thank you very much for that link. That’s a very interesting paper!
Yes, I agree with that.
It’s very interesting that even quite recently ICL was considered to be “an impossible Holy Grail goal”, with the assumption that models always need to see many examples to learn anything new (and so they are inherently inferior to biological learners in this sense).
And now we have this strong ICL ability and this gap with biological learners is gone...