SoerenMind comments on [AN #76]: How dataset size affects robustness, and benchmarking safe exploration by measuring constraint violations

SoerenMind 7 Dec 2019 21:45 UTC
2 points
Potential paper from DM/Stanford for a future newsletter: https://arxiv.org/pdf/1911.00459.pdf
It addresses the problem that an RL agent will delude itself by finding loopholes in a learned reward function.
- Rohin Shah 8 Dec 2019 5:55 UTC
  2 points
  Parent
  Thanks!