Now that o1 explicitly does RL on CoT, next token prediction for o1 is definitely not consequence blind. The next token it predicts enters into its input and can be used for future computation. This type of outcome based training makes the model more consequentialist. It also makes using a single next token prediction as the natural “task” to do interpretability on even less defensible.
Anyways, I thought I should revisit this post after o1 comes out. I can’t help noticing that it’s stylistically very different from all of the janus writing I’ve encountered in the past, then I got to the end
The ideas in the post are from a human, but most of the text was written by Chat GPT-4 with prompts and human curation using Loom.
Ha, I did notice I was confused (but didn’t bother thinking about it further)
Now that o1 explicitly does RL on CoT, next token prediction for o1 is definitely not consequence blind. The next token it predicts enters into its input and can be used for future computation.
This type of outcome based training makes the model more consequentialist. It also makes using a single next token prediction as the natural “task” to do interpretability on even less defensible.
Anyways, I thought I should revisit this post after o1 comes out. I can’t help noticing that it’s stylistically very different from all of the janus writing I’ve encountered in the past, then I got to the end
Ha, I did notice I was confused (but didn’t bother thinking about it further)