But (as you’re presumably thinking) training compute can be amortized across all occasions when the model is used, while inference compute cannot, which means it won’t be worthwhile to go very far down the road of scaling inference compute.
Inference compute is amortized across future inference when trained upon, and the three-way scaling law exchange rates between training compute vs runtime compute vs model size are critical. See AlphaZero for a good example.
Inference compute is amortized across future inference when trained upon
And it’s not just a sensible theory. This has already happened, in Huggingface’s attempted replication of o1 where the reward model was larger, had TTC, and process supervision, but the smaller main model did not have any of those expensive properties.
And also in DeepSeek v3, where the expensive TTC model (R1) was used to train a cheaper conventional LLM (DeepSeek v3).
One way to frame it is test-time-compute is actually label-search-compute: you are searching for better labels/reward, and then training on them. Repeat as needed. This is obviously easier if you know what “better” means.
Inference compute is amortized across future inference when trained upon, and the three-way scaling law exchange rates between training compute vs runtime compute vs model size are critical. See AlphaZero for a good example.
As always, if you can read only 1 thing about inference scaling, make it “Scaling Scaling Laws with Board Games”, Jones 2021.
And it’s not just a sensible theory. This has already happened, in Huggingface’s attempted replication of o1 where the reward model was larger, had TTC, and process supervision, but the smaller main model did not have any of those expensive properties.
And also in DeepSeek v3, where the expensive TTC model (R1) was used to train a cheaper conventional LLM (DeepSeek v3).
One way to frame it is test-time-compute is actually label-search-compute: you are searching for better labels/reward, and then training on them. Repeat as needed. This is obviously easier if you know what “better” means.