I agree that o1 doesn’t have a test time scaling law, at least not in a strong sense, while generatively pretrained transformers seem to have a scaling law in an extremely strong sense,.
I’d put my position like this: if you trained a GPT on a human generated internet a million times larger than the internet of our world, with a million times more parameters, for a million times more iterations, then I am confident that that GPT could beat the minecraft ender dragon zero shot.
If you gave o1 a quadrillion times more thinking time, there is no way in hell it would beat the ender dragon.
I agree that o1 doesn’t have a test time scaling law, at least not in a strong sense, while generatively pretrained transformers seem to have a scaling law in an extremely strong sense,.
I’d put my position like this: if you trained a GPT on a human generated internet a million times larger than the internet of our world, with a million times more parameters, for a million times more iterations, then I am confident that that GPT could beat the minecraft ender dragon zero shot.
If you gave o1 a quadrillion times more thinking time, there is no way in hell it would beat the ender dragon.