To give more context—The visualized maze is not where any of the training data comes from; the agent is not being trained on that fixed maze. The agent is trained on a curriculum of randomly generated levels, where the cheese (in purple) is in the top-right 5x5 corner of the maze. IIUC the shown level is a fixed validation-set seed which Uli used to visualize the checkpoint policies.
To give more context—The visualized maze is not where any of the training data comes from; the agent is not being trained on that fixed maze. The agent is trained on a curriculum of randomly generated levels, where the cheese (in purple) is in the top-right 5x5 corner of the maze. IIUC the shown level is a fixed validation-set seed which Uli used to visualize the checkpoint policies.