i predict this kind of view of non magicalness of (2023 era) LMs will become more and more accepted, and this has implications on what kinds of alignment experiments are actually valuable (see my comment on the reversal curse paper). not an argument for long (50 year+) timelines, but is an argument for medium (10 year) timelines rather than 5 year timelines
Together our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases that create fundamental generalization capabilities.
i used to call this something like “tackling the OOD generalization problem by simply making the distribution so wide that it encompasses anything you might want to use it on”
i predict this kind of view of non magicalness of (2023 era) LMs will become more and more accepted, and this has implications on what kinds of alignment experiments are actually valuable (see my comment on the reversal curse paper). not an argument for long (50 year+) timelines, but is an argument for medium (10 year) timelines rather than 5 year timelines
also this quote from the abstract is great:
i used to call this something like “tackling the OOD generalization problem by simply making the distribution so wide that it encompasses anything you might want to use it on”