It is interesting to note the limitations of the system. From the paper:
6.4.5| Failed Hand-authored Tasks
Gap tasks Similar to the task in Figure 21, in this task there is an unreachable object which the agent is tasked with being near. The object is unreachable due to the existence of a chasm between the agent and object, with no escape route (once agent falls in the chasm, it is stuck). This task requires the agent to build a ramp to navigate over to reach the object. It is worth noting that during training no such inescapable regions exist. Our agents fall into the chasm, and as a result get trapped. It suggests that agents assume that they cannot get trapped.
Multiple ramp-building tasks Whilst some tasks do show successful ramp building (Figure 21), some hand-authored tasks require multiple ramps to be built to navigate up multiple floors which are inaccessible. In these tasks the agent fails.
Following task One hand-authored task is designed such that the co-player’s goal is to be near the agent, whilst the agent’s goal is to place the opponent on a specific floor. This is very similar to the test tasks that are impossible even for a human, however in this task the co-player policy acts in a way which follows the agent’s player. The agent fails to lead the co-player to the target floor, lacking the theory-of-mind to manipulate the co-player’s movements. Since an agent does not perceive the goal of the co-player, the only way to succeed in this task would be to experiment with the co-player’s behaviour, which our agent does not do.
It is interesting to note the limitations of the system. From the paper:
6.4.5| Failed Hand-authored Tasks
Gap tasks Similar to the task in Figure 21, in this task there is an unreachable object which the agent is tasked with being near. The object is unreachable due to the existence of a chasm between the agent and object, with no escape route (once agent falls in the chasm, it is stuck). This task requires the agent to build a ramp to navigate over to reach the object. It is worth noting that during training no such inescapable regions exist. Our agents fall into the chasm, and as a result get trapped. It suggests that agents assume that they cannot get trapped.
Multiple ramp-building tasks Whilst some tasks do show successful ramp building (Figure 21), some hand-authored tasks require multiple ramps to be built to navigate up multiple floors which are inaccessible. In these tasks the agent fails.
Following task One hand-authored task is designed such that the co-player’s goal is to be near the agent, whilst the agent’s goal is to place the opponent on a specific floor. This is very similar to the test tasks that are impossible even for a human, however in this task the co-player policy acts in a way which follows the agent’s player. The agent fails to lead the co-player to the target floor, lacking the theory-of-mind to manipulate the co-player’s movements. Since an agent does not perceive the goal of the co-player, the only way to succeed in this task would be to experiment with the co-player’s behaviour, which our agent does not do.
Yep! Thanks! I’m especially keen to see whether future iterations of this system are able to succeed at these tasks.