Donald Hobson comments on What are some non-purely-sampling ways to do deep RL?

Donald Hobson 5 Dec 2019 11:40 UTC
LW: 3 AF: 2
AF
The r vs r’ problem can be reduced if you can find a way to sample points of high uncertainty.
- evhub 5 Dec 2019 19:28 UTC
  LW: 2 AF: 1
  AF Parent
  Yep—that’s the adversarial training approach to this problem. The problem is that you might not be able to sample all the relevant highly uncertain points (e.g. because you don’t know exactly what the deployment distribution will be), which means you have to do some sort of relaxed adversarial training instead, which introduces its own issues.