But haven’t people been having AIs do that—self-play-training—for a long time? I think the most remarkable idea is to use the massive breed of policy nets only to create a judgement net, and use that in the end instead. That’s wild.
But haven’t people been having AIs do that—self-play-training—for a long time? I think the most remarkable idea is to use the massive breed of policy nets only to create a judgement net, and use that in the end instead. That’s wild.