Vladimir_Nesov comments on TurnTrout’s shortform feed

Vladimir_Nesov 15 Dec 2023 13:57 UTC
LW: 20 AF: 12
0
AF
I’ve now changed my mind based on
- N Muennighoff et al. (2023) Scaling Data-Constrained Language Models
The main result is that up to 4 repetitions are about as good as unique data, and for up to about 16 repetitions there is still meaningful improvement. Let’s take 50T tokens as an estimate for available text data (as an anchor, there’s a filtered and deduplicated CommonCrawl dataset RedPajama-Data-v2 with 30T tokens). Repeated 4 times, it can make good use of 1e28 FLOPs (with a dense transformer), and repeated 16 times, suboptimal but meaningful use of 2e29 FLOPs. So this is close but not lower than what can be put to use within a few years. Thanks for pushing back on the original claim.
What links here?
- What’s the evidence that LLMs will scale up efficiently beyond GPT4? i.e. couldn’t GPT5, etc., be very inefficient? by M. Y. Zuo (24 Nov 2023 15:22 UTC; 9 points)
- Vladimir_Nesov's comment on “AI Alignment” is a Dangerously Overloaded Term by Roko (15 Dec 2023 22:40 UTC; 6 points)