Aaron_Scher comments on A Review of In-Context Learning Hypotheses for Automated AI Alignment Research

Aaron_Scher 19 Apr 2024 16:51 UTC
1 point
0
The implication of ICL being implicit BI is that the model is locating concepts it already learned in its training data, so ICL is not a new form of learning that has not been seen before.
I’m not sure I follow this. Are you saying that, if ICL is BI, then a model could not learn a fundamentally new concept in context? Can some of the hypotheses not be unknown — e.g., the model’s no-context priors are that it’s doing wikipedia prediction (50%), chat bot roleplay (40%), or some unknown role (10%). And ICL seems like it could increase the weight on the unknown role. Meanwhile, actually figuring out how to do a good job in the previously-unknown role would require piecing together other knowledge the model has — and sufficiently strong building blocks would allow a lot of learning of new concepts.
- alamerton 19 Apr 2024 19:08 UTC
  1 point
  0
  Parent
  I think I mean to say this would imply ICL could not be a new form of learning. And yes, it seems more likely that there could be at least some new knowledge getting generated, one way or another. BI implying all tasks have been previously seen feels extreme, and less likely. I’ve adjusted my wording a bit now.