Aprillion comments on Features and Adversaries in MemoryDT

Aprillion 22 Oct 2023 11:21 UTC
1 point
0
S5

What is S5, please?
- Jay Bailey 22 Oct 2023 20:16 UTC
  2 points
  0
  Parent
  The agent’s context includes the reward-to-go, state (i.e, an observation of the agent’s view of the world) and action taken for nine timesteps. So, R1, S1, A1, …. R9, S9, A9. (Figure 2 explains this a bit more) If the agent hasn’t made nine steps yet, some of the S’s are blank. So S5 is the state at the fifth timestep. Why is this important?
  
  If the agent has made four steps so far, S5 is the initial state, which lets it see the instruction. Four is the number of steps it takes to reach the corridor where the agent has to make the decision to go left or right. This is the key decision for the agent to make, and the agent only sees the instruction at S5, so S5 is important for this reason.
  
  Figure 1 visually shows this process—the static images in this figure show possible S5′s, whereas S9 is animation_frame=4 in the GIF—it’s fast, so it’s hard to see, but it’s the step before the agent turns.
  - Joseph Bloom 22 Oct 2023 20:18 UTC
    1 point
    0
    Parent
    Thanks Jay! (much better answer!)
- Joseph Bloom 22 Oct 2023 20:17 UTC
  1 point
  0
  Parent
  The first frame, apologies. This is a detail of how we number trajectories that I’ve tried to avoid dealing with in this post. We left pad in a context windows of 10 timesteps so the first observation frame is S5. I’ve updated the text not to refer to S5.