Although of course in practice, my understanding is it’s quite rare for environments to meet all the criteria and (sometimes!) the methods work anyhow
I’m sleep deprived as I wrote that/am writing this, so I may be making some technical errors.
The list was supposed to be conditions under which there (is guaranteed to) exist(s) an optimal policy that assigns a pure strategy to every state.
This doesn’t rule out the existence of environments that don’t meet all these criteria and nonetheless have optimal policies that assign pure strategies to some or all states. Such an optimal policy just isn’t guaranteed to exist.
(Some games have pure Nash equilibria/but pure Nash equilibria are not guaranteed to exist in general.)
That said, knowing the laws of physics/transition rules was meant to cover the class of non stochastic environments with multiple possible state transitions from a given state and action.
(Maybe one could say that such environments are non deterministic, but the state transitions could probably be modelled as fully deterministic if one added appropriate hidden state variables and/or allowed a state’s transition to be path dependent.)
It’s in this sense that the agent needs to know the transition rules of the environment for pure strategies to be optimal in general.
I’m sleep deprived as I wrote that/am writing this, so I may be making some technical errors.
The list was supposed to be conditions under which there (is guaranteed to) exist(s) an optimal policy that assigns a pure strategy to every state.
This doesn’t rule out the existence of environments that don’t meet all these criteria and nonetheless have optimal policies that assign pure strategies to some or all states. Such an optimal policy just isn’t guaranteed to exist.
(Some games have pure Nash equilibria/but pure Nash equilibria are not guaranteed to exist in general.)
That said, knowing the laws of physics/transition rules was meant to cover the class of non stochastic environments with multiple possible state transitions from a given state and action. (Maybe one could say that such environments are non deterministic, but the state transitions could probably be modelled as fully deterministic if one added appropriate hidden state variables and/or allowed a state’s transition to be path dependent.)
It’s in this sense that the agent needs to know the transition rules of the environment for pure strategies to be optimal in general.