A lot of people are looking at the implications of o1′s training process as a future scaling paradigm, but it seems to me that this implementation of applying inference time compute to just in time fine tune the model for hard questions is equally promising and may have equally impressive results if it scales with compute, and has equal potential in terms of low hanging fruit to be picked to improve it.
Don’t sleep on test time training as a potential future scaling paradigm.
A lot of people are looking at the implications of o1′s training process as a future scaling paradigm, but it seems to me that this implementation of applying inference time compute to just in time fine tune the model for hard questions is equally promising and may have equally impressive results if it scales with compute, and has equal potential in terms of low hanging fruit to be picked to improve it.
Don’t sleep on test time training as a potential future scaling paradigm.