Maybe. This is potentially part of the explanation for “data double descent” although I haven’t thought about it beyond the 5min I spent writing that page and the 30min I spent talking about it with you at the June conference. I’d be very interested to see someone explore this more systematically (e.g. in the setting of Anthropic’s “other” TMS paper https://www.anthropic.com/index/superposition-memorization-and-double-descent which contains data double descent in a setting where the theory of our recent TMS paper might allow you to do something).
Maybe. This is potentially part of the explanation for “data double descent” although I haven’t thought about it beyond the 5min I spent writing that page and the 30min I spent talking about it with you at the June conference. I’d be very interested to see someone explore this more systematically (e.g. in the setting of Anthropic’s “other” TMS paper https://www.anthropic.com/index/superposition-memorization-and-double-descent which contains data double descent in a setting where the theory of our recent TMS paper might allow you to do something).