We haven’t yet seen what happens when they turn to the verifiable property of o3 to self-play on a variety of strategy games. I suspect that it will unlock a lot of general reasoning and strategy
Do you think there’s some initial evidence for that? E.g. Voyager or others from Deepmind. Self play gets thrown around a lot, not sure if concretely we’ve seen much yet for LLMs using it.
But yes agree, good point regarding strategy games being a domain that could be verifiable
We haven’t yet seen what happens when they turn to the verifiable property of o3 to self-play on a variety of strategy games. I suspect that it will unlock a lot of general reasoning and strategy
Do you think there’s some initial evidence for that? E.g. Voyager or others from Deepmind. Self play gets thrown around a lot, not sure if concretely we’ve seen much yet for LLMs using it.
But yes agree, good point regarding strategy games being a domain that could be verifiable