Maxime Riché comments on Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever’s Recent Claims

Maxime Riché 13 Nov 2024 23:10 UTC
3 points
−6
People may be blind to the fact that improvements from gpt2 to 3 to 4 were both driven by scaling training compute (by 2 OOM between each generation) and (the hidden part) by scaling test compute through long context and CoT (like 1.5-2 OOM between each generations too).

If gpt5 uses just 2 OOM more training compute than gpt4 but the same test compute, then we should not expect “similar” gains, we should expect “half”.

O1 may use 2 OOM more test compute than gpt4. So gpt4=>O1+gpt5 could be expected to be similar to gpt3=>gpt4