My guess is that further scaffolding work (e.g. improved enforcement of thinking out a step by step plan, and then executing the steps of the plan in order) would result in unlocking a lot more capabilities from current open-weight models.
I also expect that this is but a hint of things to come, and that by this time next year we’ll see evidence that then-current models can accomplish a wide range of complex multi-step tasks.
My guess is that further scaffolding work (e.g. improved enforcement of thinking out a step by step plan, and then executing the steps of the plan in order) would result in unlocking a lot more capabilities from current open-weight models. I also expect that this is but a hint of things to come, and that by this time next year we’ll see evidence that then-current models can accomplish a wide range of complex multi-step tasks.