Yes, it at least tries to learn the model used in constructing the training data (having to specify a good model is definitely an issue it shares with IRL).
I thought the point of the virtual environment was to provide a place to empirically test a bunch of possible approaches to value learning
An analogy might be how OpenAI trained a robot hand controller by training it in a set of simulations with diverse physical parameters. It then learned the general skill of operating in a wide variety of situations, so then it could be directly used in the real world.
Yes, it at least tries to learn the model used in constructing the training data (having to specify a good model is definitely an issue it shares with IRL).
An analogy might be how OpenAI trained a robot hand controller by training it in a set of simulations with diverse physical parameters. It then learned the general skill of operating in a wide variety of situations, so then it could be directly used in the real world.
That’s an excellent analogy.