Can you expand more on this or recommend some reading:
“In the beginning, when the agent knows nothing about the real maze, its simulated mazes will be poor imitations of the real thing, as it lacks data on what mazes are supposed to look like. So the experience it gains from these simulated mazes is worthless.
But slowly, as the robot drives through the real maze and collects increasing ground-truth data about the maze, it can improve its simulated mazes until they become reasonably accurate approximations of the real thing. Now when the algorithm solves a thousand simulated mazes, it gains the experience of having solved a thousand physical mazes, even though in reality it may have only solved a handful.”
I think there are two things being talked about here, the ability to solve mazes and knowledge of what a maze looks like.
If the internal representation of the maze is a generic one like a Graph/Tree/whatever then you can generate a whole bunch of fake mazes easily since here a random maze == a random tree, and a random tree is easy to make. The observations that the robot makes about mazes that it does encounter can then be used to inform what type of tree to generate.
For example you would not be likely to observe a 2D maze with four left turns in a row, and so when you are generating your random tree you wouldn’t for example generate a tree with four left branches in a row or whatever.
The generation of correct-looking mazes is the “have I got a good understanding of the problem” part of the problem, and the simulation of lots of maze solving events is the “given this understanding, how well can I solve this problem” part of the problem.
Yes, this is the idea! My example here is a highly oversimplified description of Rollout Algorithms, a property of Monte Carlo Tree Search, which you can read more about in Chapter 8.10 in the book.
Really appreciate the no math summary!
Can you expand more on this or recommend some reading:
“In the beginning, when the agent knows nothing about the real maze, its simulated mazes will be poor imitations of the real thing, as it lacks data on what mazes are supposed to look like. So the experience it gains from these simulated mazes is worthless.
But slowly, as the robot drives through the real maze and collects increasing ground-truth data about the maze, it can improve its simulated mazes until they become reasonably accurate approximations of the real thing. Now when the algorithm solves a thousand simulated mazes, it gains the experience of having solved a thousand physical mazes, even though in reality it may have only solved a handful.”
I think there are two things being talked about here, the ability to solve mazes and knowledge of what a maze looks like.
If the internal representation of the maze is a generic one like a Graph/Tree/whatever then you can generate a whole bunch of fake mazes easily since here a random maze == a random tree, and a random tree is easy to make. The observations that the robot makes about mazes that it does encounter can then be used to inform what type of tree to generate.
For example you would not be likely to observe a 2D maze with four left turns in a row, and so when you are generating your random tree you wouldn’t for example generate a tree with four left branches in a row or whatever.
The generation of correct-looking mazes is the “have I got a good understanding of the problem” part of the problem, and the simulation of lots of maze solving events is the “given this understanding, how well can I solve this problem” part of the problem.
Yes, this is the idea! My example here is a highly oversimplified description of Rollout Algorithms, a property of Monte Carlo Tree Search, which you can read more about in Chapter 8.10 in the book.