nostalgebraist comments on Fun with +12 OOMs of Compute

nostalgebraist 10 Jan 2023 20:10 UTC
LW: 15 AF: 7
AF
uses about six FLOP per parameter per token
Shouldn’t this be 2 FLOP per parameter per token, since our evolutionary search is not doing backward passes?

On the other hand, the calculation in the footnote seems to assume that 1 function call = 1 token, which is clearly an unrealistic lower bound.
A “lowest-level” function (one that only uses a single context window) will use somewhere between 1 and $n_{c t x} = O (10^{3})$ tokens. Functions defined by composition over “lowest-level” functions, as described two paragraphs above, will of course require more tokens per call than their constituents.
- Daniel Kokotajlo 13 Jan 2023 3:14 UTC
  LW: 4 AF: 3
  AF Parent
  Thanks for checking my math & catching this error!