This more or less regresses to an offline supervised learning model in which a bunch of samples are collected upfront, a model trained, and then used to predict all future actions. While you might be framing your problem as an MDP, you’re not doing reinforcement learning in this case. As TurnTrout mentioned in a sibling to this comment it works only in the stationary & deterministic environments which represent toy problems for the space, but ultimately the goal for RL is to function in non-stationary non-deterministic environments so it makes little sense to focus on this path.
This more or less regresses to an offline supervised learning model in which a bunch of samples are collected upfront, a model trained, and then used to predict all future actions. While you might be framing your problem as an MDP, you’re not doing reinforcement learning in this case. As TurnTrout mentioned in a sibling to this comment it works only in the stationary & deterministic environments which represent toy problems for the space, but ultimately the goal for RL is to function in non-stationary non-deterministic environments so it makes little sense to focus on this path.