This post seems interesting and promising, thanks for writing it!
The most predictable way zero-sum competition can fail is if one of the models is consistently better than the other at predicting.
I think this could be straightforwardly solved by not training two different models at all, but by giving two instances of the same model inputs that are both slightly perturbed in the same random way. Then, neither instance of the model would ever have a predictable advantage over the other.
For instance, in your movie recommendation example, let’s say the model takes a list of 1000 user movie ratings as input. We can generate a perturbed input by selecting 10 of those ratings at random and modifying them, say by changing a 4-star rating to a 5-star rating. We do this twice to get two different inputs, feed them into the model, and train based on the outputs as you described.
Another very similar solution would be to randomly perturb the internal activations of each neural network during training.
Thanks for the comment. I agree that, ideally, we would find a way not to have two wholly separate models and instead somehow train a model against itself. I think a potential issue with your proposal is that small perturbations could have discontinuous effects, the anticipation of which distorts predictions. However, it would be interesting to think about further to see if there’s some way to avoid that issue.
This post seems interesting and promising, thanks for writing it!
I think this could be straightforwardly solved by not training two different models at all, but by giving two instances of the same model inputs that are both slightly perturbed in the same random way. Then, neither instance of the model would ever have a predictable advantage over the other.
For instance, in your movie recommendation example, let’s say the model takes a list of 1000 user movie ratings as input. We can generate a perturbed input by selecting 10 of those ratings at random and modifying them, say by changing a 4-star rating to a 5-star rating. We do this twice to get two different inputs, feed them into the model, and train based on the outputs as you described.
Another very similar solution would be to randomly perturb the internal activations of each neural network during training.
Does this seem right?
Thanks for the comment. I agree that, ideally, we would find a way not to have two wholly separate models and instead somehow train a model against itself. I think a potential issue with your proposal is that small perturbations could have discontinuous effects, the anticipation of which distorts predictions. However, it would be interesting to think about further to see if there’s some way to avoid that issue.