Yeah. I think I did notice it talking about a stochastic policy at one point, and on reflection I don’t see any other reasonable way to do that. This interpretation also accords with making the agent’s actions part of the observation history. If they were a pure function of the observations, we wouldn’t need them to be there.
Yeah. I think I did notice it talking about a stochastic policy at one point, and on reflection I don’t see any other reasonable way to do that. This interpretation also accords with making the agent’s actions part of the observation history. If they were a pure function of the observations, we wouldn’t need them to be there.