Do you think there’s some initial evidence for that? E.g. Voyager or others from Deepmind. Self play gets thrown around a lot, not sure if concretely we’ve seen much yet for LLMs using it.
But yes agree, good point regarding strategy games being a domain that could be verifiable
Do you think there’s some initial evidence for that? E.g. Voyager or others from Deepmind. Self play gets thrown around a lot, not sure if concretely we’ve seen much yet for LLMs using it.
But yes agree, good point regarding strategy games being a domain that could be verifiable