To me, this comparison to humans doesn’t seem to answer why the o1 training ended up producing this result.
Convergence. Humans and LLMs with deliberation do the same thing and end up making the same class of errors
To me, this comparison to humans doesn’t seem to answer why the o1 training ended up producing this result.
Convergence. Humans and LLMs with deliberation do the same thing and end up making the same class of errors