o1 was the first large reasoning model — as we outlined in the original “Learning to Reason” blog, it’s “just” an LLM trained with RL. o3 is powered by further scaling up RL beyond o1
@ryan_greenblatt Shouldn’t this be interpreted as a very big update vs. the neuralese-in-o3 hypothesis?
An LLM trained with a sufficient amount of RL maybe could learn to compress its thoughts into more efficient representations than english text, which seems consistent with the statement. I’m not sure if this is possible in practice; I’ve asked here if anyone knows of public examples.
From https://x.com/__nmca__/status/1870170101091008860:
@ryan_greenblatt Shouldn’t this be interpreted as a very big update vs. the neuralese-in-o3 hypothesis?
No
An LLM trained with a sufficient amount of RL maybe could learn to compress its thoughts into more efficient representations than english text, which seems consistent with the statement. I’m not sure if this is possible in practice; I’ve asked here if anyone knows of public examples.
Yes, I would count it if the CoT is total gibberish which is (steganographically) encoding reasoning.