The agent’s context includes the reward-to-go, state (i.e, an observation of the agent’s view of the world) and action taken for nine timesteps. So, R1, S1, A1, …. R9, S9, A9. (Figure 2 explains this a bit more) If the agent hasn’t made nine steps yet, some of the S’s are blank. So S5 is the state at the fifth timestep. Why is this important?
If the agent has made four steps so far, S5 is the initial state, which lets it see the instruction. Four is the number of steps it takes to reach the corridor where the agent has to make the decision to go left or right. This is the key decision for the agent to make, and the agent only sees the instruction at S5, so S5 is important for this reason.
Figure 1 visually shows this process—the static images in this figure show possible S5′s, whereas S9 is animation_frame=4 in the GIF—it’s fast, so it’s hard to see, but it’s the step before the agent turns.
The agent’s context includes the reward-to-go, state (i.e, an observation of the agent’s view of the world) and action taken for nine timesteps. So, R1, S1, A1, …. R9, S9, A9. (Figure 2 explains this a bit more) If the agent hasn’t made nine steps yet, some of the S’s are blank. So S5 is the state at the fifth timestep. Why is this important?
If the agent has made four steps so far, S5 is the initial state, which lets it see the instruction. Four is the number of steps it takes to reach the corridor where the agent has to make the decision to go left or right. This is the key decision for the agent to make, and the agent only sees the instruction at S5, so S5 is important for this reason.
Figure 1 visually shows this process—the static images in this figure show possible S5′s, whereas S9 is animation_frame=4 in the GIF—it’s fast, so it’s hard to see, but it’s the step before the agent turns.
Thanks Jay! (much better answer!)