It seems like the obvious thing to do with a model like o1 trained on reasoning through problems would be to train it to write code that helps it solve reasoning problems.
Perhaps the idea was to not give it this crutch so it could learn those reasoning skills without the help of code.
But it seems like from the examples that while its great at high level reasoning and figuring out where it went wrong, it still struggles with basic things like counting, which, if it had the instinct to write code in those areas which it’s likely to get tripped up, would be easily solved.
It seems like the obvious thing to do with a model like o1 trained on reasoning through problems would be to train it to write code that helps it solve reasoning problems.
Perhaps the idea was to not give it this crutch so it could learn those reasoning skills without the help of code.
But it seems like from the examples that while its great at high level reasoning and figuring out where it went wrong, it still struggles with basic things like counting, which, if it had the instinct to write code in those areas which it’s likely to get tripped up, would be easily solved.