One thing that’s really critical to making the model work at all is test time fine-tuning. By the way, that’s something that’s really missing from LLM approaches right now
Since AFAIK in-context learning functions pretty similarly to fine-tuning (though I haven’t looked into this much), it’s not clear to me why Chollet sees online fine-tuning as deeply different from few-shot prompting. Certainly few-shot prompting works extremely well for many tasks; maybe it just empirically doesn’t help much on this one?
Since AFAIK in-context learning functions pretty similarly to fine-tuning (though I haven’t looked into this much), it’s not clear to me why Chollet sees online fine-tuning as deeply different from few-shot prompting. Certainly few-shot prompting works extremely well for many tasks; maybe it just empirically doesn’t help much on this one?
As per “Transformers learn in-context by gradient descent”, which Gwern also mentions in the comment that @quetzal_rainbow links here.