Really interesting post, and I think proper environment creation is one of, if not the most important question when it comes to the RL-based path to AGI.
You made a point that, contrary to the expectations of some, environments like Go or Starcraft are not sufficient to create the type of flexible, adaptive AGI that we’re looking for. I wonder if success in creating such AGI is dependent primarily on the complexity of the environment? That is, even though environments like Starcraft are quite complex and require some of the abstract reasoning we’d expect of AGI, the actual complexity isn’t anywhere close to the complexity of the real world in which we want those AGIs to perform. I wonder too if increasing environmental complexity will provide some inherent regularisation, i.e. it’s more difficult to fall into very narrow solutions when the possible states of your environment are very large.
If that is the case, the question that naturally follows is how do we create environments that mimic the complexity of the actual world? Of course a full simulation isn’t feasible, but I wonder if it would be possible to create highly complex world models using neural techniques.
This would be quite difficult computationally, but what would happen if one were to train an RL agent, where the environment was provided by a GPT-3 esque world model? For instance, imagine AI-dungeon (a popular gpt-3-based DnD dungeon master) and an RL agent interacting with AI-dungeon. I’m not certain what the utility function could be, maybe maximizing gold / xp / level / or similar? Certainly an agent that can “win at” DnD would be closer to an AGI than anything that’s been made to date. Similarly, I could imagine a future version of GPT that modeled video frames {i.e. predict the next frame based on the previous frame(s)}. An RL agent that was trained to produce some given frame as a desired state would certainly be able to solve problems in the real world, no? (Of course the actual implementation of a video based GPT would, computationally, be incredibly expensive, but not completely out of the question). Are there merits to this approach?
Really interesting post, and I think proper environment creation is one of, if not the most important question when it comes to the RL-based path to AGI.
You made a point that, contrary to the expectations of some, environments like Go or Starcraft are not sufficient to create the type of flexible, adaptive AGI that we’re looking for. I wonder if success in creating such AGI is dependent primarily on the complexity of the environment? That is, even though environments like Starcraft are quite complex and require some of the abstract reasoning we’d expect of AGI, the actual complexity isn’t anywhere close to the complexity of the real world in which we want those AGIs to perform. I wonder too if increasing environmental complexity will provide some inherent regularisation, i.e. it’s more difficult to fall into very narrow solutions when the possible states of your environment are very large.
If that is the case, the question that naturally follows is how do we create environments that mimic the complexity of the actual world? Of course a full simulation isn’t feasible, but I wonder if it would be possible to create highly complex world models using neural techniques.
This would be quite difficult computationally, but what would happen if one were to train an RL agent, where the environment was provided by a GPT-3 esque world model? For instance, imagine AI-dungeon (a popular gpt-3-based DnD dungeon master) and an RL agent interacting with AI-dungeon. I’m not certain what the utility function could be, maybe maximizing gold / xp / level / or similar? Certainly an agent that can “win at” DnD would be closer to an AGI than anything that’s been made to date. Similarly, I could imagine a future version of GPT that modeled video frames {i.e. predict the next frame based on the previous frame(s)}. An RL agent that was trained to produce some given frame as a desired state would certainly be able to solve problems in the real world, no? (Of course the actual implementation of a video based GPT would, computationally, be incredibly expensive, but not completely out of the question). Are there merits to this approach?