TurnTrout comments on TurnTrout’s shortform feed

TurnTrout 9 Dec 2020 18:58 UTC
2 points
In Markov decision processes, state-action reward functions seem less natural to me than state-based reward functions, at least if they assign different rewards to equivalent actions. That is, actions $a, a^{'}$ at a state $s$ can have different reward $R (s, a) \neq R (s, a^{'})$ even though they induce the same transition probabilities: $\forall s^{'} : T (s, a, s^{'}) = T (s, a^{'}, s^{'})$ . This is unappealing because the actions don’t actually have a “noticeable difference” from within the MDP, and the MDP is visitation-distribution-isomorphic to an MDP without the action redundancy.