William_S comments on Reinforcement Learning in the Iterated Amplification Framework

William_S 18 Feb 2019 21:09 UTC
3 points
RL is typically about sequential decision-making, and I wasn’t sure where the “sequential” part came in).
I guess I’ve used the term “reinforcement learning” to refer to a broader class of problems including both one-shot bandit problems and sequential decision making problems. In this view The feature that makes RL different from supervised learning is not that we’re trying to figure out what how to act in an MDP/POMDP, but instead that we’re trying to optimize a function that we can’t take the derivative of (in the MDP case, it’s because the environment is non-differentiable, and in the approval learning case, it’s because the overseer is non-differentiable).
- Rohin Shah 19 Feb 2019 1:19 UTC
  2 points
  Parent
  Got it, thanks for clarifying.