In Markov decision processes, state-action reward functions seem less natural to me than state-based reward functions, at least if they assign different rewards to equivalent actions. That is, actions a,a′ at a state s can have different reward R(s,a)≠R(s,a′) even though they induce the same transition probabilities: ∀s′:T(s,a,s′)=T(s,a′,s′). This is unappealing because the actions don’t actually have a “noticeable difference” from within the MDP, and the MDP is visitation-distribution-isomorphic to an MDP without the action redundancy.
In Markov decision processes, state-action reward functions seem less natural to me than state-based reward functions, at least if they assign different rewards to equivalent actions. That is, actions a,a′ at a state s can have different reward R(s,a)≠R(s,a′) even though they induce the same transition probabilities: ∀s′:T(s,a,s′)=T(s,a′,s′). This is unappealing because the actions don’t actually have a “noticeable difference” from within the MDP, and the MDP is visitation-distribution-isomorphic to an MDP without the action redundancy.