This is a cool result. If I’m understanding correctly, M- increases its loss the more that M+ is represented in the mixture, thereby encouraging SGD to make M- more prominent.
Is there a way to extend this to cases where M- doesn’t have access to the weights? I think that probably requires an RL environment, but that’s entirely based on “I thought about it for a few minutes and couldn’t find a way to do it without RL” so I could be way off here.
Given an RL environment I suspect M- could steer the model into scenarios that make it look better than M+...
This is a cool result. If I’m understanding correctly, M- increases its loss the more that M+ is represented in the mixture, thereby encouraging SGD to make M- more prominent.
Is there a way to extend this to cases where M- doesn’t have access to the weights? I think that probably requires an RL environment, but that’s entirely based on “I thought about it for a few minutes and couldn’t find a way to do it without RL” so I could be way off here.
Given an RL environment I suspect M- could steer the model into scenarios that make it look better than M+...