This is certainly interesting! To put things in proportion though, here are some limitations that I see, after skimming the paper and watching the video:
The virtual laws of physics are always the same. So, the sense in which this agent is “generally capable” is only via the geometry and the formal specification of the goal. Which is still interesting to be sure! But not as a big deal as it would be if it did zero-shot learning of physics (which would be an enormous deal IMO).
The formal specification is limited to propositional calculus. This allows for a combinatorial explosion of possible goals, but there’s still some sense in which it is “narrow”. It would be a bigger deal if it used some more expressive logical language.
For some tasks it looks like the agent is just trying vaguely relevant things at random until it achieves the goal. So, it is able to recognize the goal has been achieved, but less able to come up with efficient plans for achieving it. While “trying stuff until something sticks” is definitely a strategy I can relate to, it is not as impressive as planning in advance. Notice that just recognizing the goal is relatively easy: modulo the transformation from 2D imagery to a 3D model (which is certainly non-trivial but not a novel capability), you don’t need AI to do it at all (indeed the environment obviously computes the reward via handcrafted code).
Thanks! This is exactly the sort of thoughtful commentary I was hoping to get when I made this linkpost.
--I don’t see what the big deal is about laws of physics. Humans and all their ancestors evolved in a world with the same laws of physics; we didn’t have to generalize to different worlds with different laws. Also, I don’t think “be superhuman at figuring out the true laws of physics” is on the shortest path to AIs being dangerous. Also, I don’t think AIs need to control robots or whatnot in the real world to be dangerous, so they don’t even need to be able to understand the true laws of physics, even on a basic level.
--I agree it would be a bigger deal if they could use e.g. first-order logic, but not that much of a bigger deal? Put it this way: wanna bet about what would happen if they retrained these agents, but with 10x bigger brains and for 10x longer, in an expanded environment that supported first-order logic? I’d bet that we’d get agents that perform decently well at first-order logic goals.
--Yeah, these agents don’t seem smart exactly; they seem to be following pretty simple general strategies… but they seem human-like and on a path to smartness, i.e. I can easily imagine them getting smoothly better and better as we make them bigger and train them for longer on more varied environments. I think of these guys as the GPT-1 of agent AGI.
I don’t see what the big deal is about laws of physics. Humans and all their ancestors evolved in a world with the same laws of physics; we didn’t have to generalize to different worlds with different laws. Also, I don’t think “be superhuman at figuring out the true laws of physics” is on the shortest path to AIs being dangerous. Also, I don’t think AIs need to control robots or whatnot in the real world to be dangerous, so they don’t even need to be able to understand the true laws of physics, even on a basic level.
The entire novelty of this work revolves around zero-shot / few-shot performance: the ability to learn new tasks which don’t come with astronomic amounts of training data. To evaluate to which extent this goal has been achieved, we need to look at what was actually new about the tasks vs. what was repeated in the training data a zillion times. So, my point was, the laws of physics do not contribute to this aspect.
Moreover, although the laws of physics are fixed, we didn’t evolve to know all of physics. Lots of intuition about 3D geometry and mechanics: definitely. But there are many, many things about the world we had to learn. A bronze age blacksmith posseted sophisticated knowledge about the properties of materials and their interaction that did not come from their genes, not to mention a modern rocket scientist. (Ofc, the communication of knowledge means that each of them benefits from training data acquired by other people and previous generations, and yet.) And, learning is equivalent to performing well on a distribution of different worlds.
Finally, an AI doesn’t need to control robots to be dangerous but it does need to create sophisticated models of the world and the laws which govern it. That doesn’t necessarily mean being good at the precise thing we call “physics” (e.g. figuring out quantum gravity), but it is a sort of “physics” broadly construed (so, including any area of science and/or human behavior and/or dynamics of human societies etc.)
I agree it would be a bigger deal if they could use e.g. first-order logic, but not that much of a bigger deal? Put it this way: wanna bet about what would happen if they retrained these agents, but with 10x bigger brains and for 10x longer, in an expanded environment that supported first-order logic?
I might be tempted to take some such bet, but it seems hard to operationalize. Also hard to test unless DeepMind will happen to perform this exact experiment.
What really impressed me were the generalized strategies the agent applied to multiple situations/goals. E.g., “randomly move things around until something works” sounds simple, but learning to contextually apply that strategy
to the appropriate objects,
in scenarios where you don’t have a better idea of what to do, and
immediately stopping when you find something that works
is fairly difficult for deep agents to learn. I think of this work as giving the RL agents a toolbox of strategies that can be flexibly applied to different scenarios.
I suspect that finetuning agents trained in XLand in other physical environments will give good results because the XLand agents already know how to use relatively advanced strategies. Learning to apply the XLand strategies to the new physical environments will probably be easier than starting from scratch in the new environment.
This is certainly interesting! To put things in proportion though, here are some limitations that I see, after skimming the paper and watching the video:
The virtual laws of physics are always the same. So, the sense in which this agent is “generally capable” is only via the geometry and the formal specification of the goal. Which is still interesting to be sure! But not as a big deal as it would be if it did zero-shot learning of physics (which would be an enormous deal IMO).
The formal specification is limited to propositional calculus. This allows for a combinatorial explosion of possible goals, but there’s still some sense in which it is “narrow”. It would be a bigger deal if it used some more expressive logical language.
For some tasks it looks like the agent is just trying vaguely relevant things at random until it achieves the goal. So, it is able to recognize the goal has been achieved, but less able to come up with efficient plans for achieving it. While “trying stuff until something sticks” is definitely a strategy I can relate to, it is not as impressive as planning in advance. Notice that just recognizing the goal is relatively easy: modulo the transformation from 2D imagery to a 3D model (which is certainly non-trivial but not a novel capability), you don’t need AI to do it at all (indeed the environment obviously computes the reward via handcrafted code).
Thanks! This is exactly the sort of thoughtful commentary I was hoping to get when I made this linkpost.
--I don’t see what the big deal is about laws of physics. Humans and all their ancestors evolved in a world with the same laws of physics; we didn’t have to generalize to different worlds with different laws. Also, I don’t think “be superhuman at figuring out the true laws of physics” is on the shortest path to AIs being dangerous. Also, I don’t think AIs need to control robots or whatnot in the real world to be dangerous, so they don’t even need to be able to understand the true laws of physics, even on a basic level.
--I agree it would be a bigger deal if they could use e.g. first-order logic, but not that much of a bigger deal? Put it this way: wanna bet about what would happen if they retrained these agents, but with 10x bigger brains and for 10x longer, in an expanded environment that supported first-order logic? I’d bet that we’d get agents that perform decently well at first-order logic goals.
--Yeah, these agents don’t seem smart exactly; they seem to be following pretty simple general strategies… but they seem human-like and on a path to smartness, i.e. I can easily imagine them getting smoothly better and better as we make them bigger and train them for longer on more varied environments. I think of these guys as the GPT-1 of agent AGI.
The entire novelty of this work revolves around zero-shot / few-shot performance: the ability to learn new tasks which don’t come with astronomic amounts of training data. To evaluate to which extent this goal has been achieved, we need to look at what was actually new about the tasks vs. what was repeated in the training data a zillion times. So, my point was, the laws of physics do not contribute to this aspect.
Moreover, although the laws of physics are fixed, we didn’t evolve to know all of physics. Lots of intuition about 3D geometry and mechanics: definitely. But there are many, many things about the world we had to learn. A bronze age blacksmith posseted sophisticated knowledge about the properties of materials and their interaction that did not come from their genes, not to mention a modern rocket scientist. (Ofc, the communication of knowledge means that each of them benefits from training data acquired by other people and previous generations, and yet.) And, learning is equivalent to performing well on a distribution of different worlds.
Finally, an AI doesn’t need to control robots to be dangerous but it does need to create sophisticated models of the world and the laws which govern it. That doesn’t necessarily mean being good at the precise thing we call “physics” (e.g. figuring out quantum gravity), but it is a sort of “physics” broadly construed (so, including any area of science and/or human behavior and/or dynamics of human societies etc.)
I might be tempted to take some such bet, but it seems hard to operationalize. Also hard to test unless DeepMind will happen to perform this exact experiment.
What really impressed me were the generalized strategies the agent applied to multiple situations/goals. E.g., “randomly move things around until something works” sounds simple, but learning to contextually apply that strategy
to the appropriate objects,
in scenarios where you don’t have a better idea of what to do, and
immediately stopping when you find something that works
is fairly difficult for deep agents to learn. I think of this work as giving the RL agents a toolbox of strategies that can be flexibly applied to different scenarios.
I suspect that finetuning agents trained in XLand in other physical environments will give good results because the XLand agents already know how to use relatively advanced strategies. Learning to apply the XLand strategies to the new physical environments will probably be easier than starting from scratch in the new environment.