codyrioux comments on Making a Difference Tempore: Insights from ‘Reinforcement Learning: An Introduction’

codyrioux 6 Jul 2018 18:45 UTC
2 points
This more or less regresses to an offline supervised learning model in which a bunch of samples are collected upfront, a model trained, and then used to predict all future actions. While you might be framing your problem as an MDP, you’re not doing reinforcement learning in this case. As TurnTrout mentioned in a sibling to this comment it works only in the stationary & deterministic environments which represent toy problems for the space, but ultimately the goal for RL is to function in non-stationary non-deterministic environments so it makes little sense to focus on this path.