This view would imply that experiments at substantially smaller (but absolutely large) scale don’t generalize up to a higher scale or at least very quickly hit dimishing returns in generalizing up to higher scale which seems a bit implausible to me.
Agree this is an implication. (It’s an implication of any view where compute can be a hard bottleneck—past a certain point you learn 10X less info by running an experiment at a 10X smaller scale.)
But why implausible? Could we have developed RLHF, prompting, tool-use, and reasoning models via loads of experiments on GPT-2 scale models? Does make sense to me that those models just aren’t smart enough to learn any of this and your experiments have 0 signal.
An alternative option is to just reduce the frontier scale with AIs
Yeah I think this is a plausible strategy. If you can make 100X faster progress at the 10^26 scale than the 10^27 scale, why not do it.
Also, I think I haven’t seen anyone articulate this view other than you in a comment responding to me earlier, so I didn’t think this exact perspective was that important to address.
Well unfortunately the people actively defending the view that compute will be a bottleneck haven’t been specific about what the think the functional form is. They’ve just said vague things like “compute for experiments is a bottleneck”. In that post I initially gave the simplest model for concretising that claim, and you followed suite in this post when talking about “7 OOMs”, but I don’t think anyone’s said that model represents their view than the ‘near frontier experiments’ model.
Yeah agree parametric/evolution stuff changes things.
But if you couldn’t do that stuff, do you agree cognitive labour would plausibly have been a hard bottleneck?
If so, that does seem analogous to if we scale up cognitive labour by 3 OOMs. After all, i’m not sure what the analogue of “parametric experiments” is when you have abundant cognitive labour and limited compute.