although typically rewards in RL depend only on states,
Presumably this should be a period? (Or perhaps there’s a clause missing pointing out the distinction between caring about history and caring about states, tho you could transform one into the other?)
Supposed to be a period, fixed now. While you can transform one into the other, I find it fairly unnatural, and I would guess this would be the case for other ML researchers. Typically, if we want to do things that depend on history, we just drop the Markov assumption, rather than defining the state to be the entire history.
Presumably this should be a period? (Or perhaps there’s a clause missing pointing out the distinction between caring about history and caring about states, tho you could transform one into the other?)
Supposed to be a period, fixed now. While you can transform one into the other, I find it fairly unnatural, and I would guess this would be the case for other ML researchers. Typically, if we want to do things that depend on history, we just drop the Markov assumption, rather than defining the state to be the entire history.
Also, if you define the state to be the entire history, you lose ergodicity assumptions that are needed to prove that algorithms can learn well.