Glancing back at this, I noted I missed the most obvious form of self-play: putting an agent in an interaction with another copy of itself. You could do any sort of “scoring” by having an automated of the outcome vs. the current goal.
This has some obvious downsides, in that the agents aren’t the same as people. But it might get you a good bit of extra training that predicting static datasets doesn’t give. A little interaction with real humans might be the cherry on top of the self-play whipped cream on the predictive learning sundae.
Glancing back at this, I noted I missed the most obvious form of self-play: putting an agent in an interaction with another copy of itself. You could do any sort of “scoring” by having an automated of the outcome vs. the current goal.
This has some obvious downsides, in that the agents aren’t the same as people. But it might get you a good bit of extra training that predicting static datasets doesn’t give. A little interaction with real humans might be the cherry on top of the self-play whipped cream on the predictive learning sundae.