ryan_greenblatt comments on TurnTrout’s shortform feed

ryan_greenblatt 22 May 2024 19:29 UTC
3 points
0
Yeah, if you use constant compute to explain each “feature” and features are proportional to model scale, this is only O(n^2) which is the same as training compute.

However, it seems plausible to me that you actually need to look at interactions between features and so you end up with O(log(n) n^2) or even O(n^3).

Also constant factors can easily destroy you here.