Broadly agree with this post, though I’ll nitpick the inclusion of robotics here. I don’t think it’s progressing nearly as fast as ML, and it seems fairly uncontroversial that we’re not nearly as close to human-level motor control as we are to (say) human-level writing. I only bring this up because a decent chunk of bad reasoning (usually underestimation) I see around AGI risk comes from skepticism about robotics progress, which is mostly irrelevant in my model.
I’m not sure why some skepticism would be unjustified from lack of progress in robots.
Robots require reliability, because otherwise you destroy hardware and other material. Even in areas where we have had enormous progress, (LLMs, Diffusion) we do not have reliability, such that you can trust the output of them without supervision, broadly. So such lack of reliability seems indicative of perhaps some fundamental things yet to be learned.
The skepticism that I object to has less to do with the idea that ML systems are not robust enough to operate robots and more to do with people rationalizing based off of the intrinsic feeling that “robots are not scary enough to justify considering AGI a credible threat”. (Whether they voice this intuition or not)
I agree that having highly capable robots which operate off of ML would be evidence for AGI soon and thus the lack of such robots is evidence in the opposite direction.
That said, because the main threat from AGI that I am concerned about comes from reasoning and planning capabilities, I think it can be somewhat of a red herring. I’m not saying we shouldn’t update on the lack of competent robots, but I am saying that we shouldn’t flippantly use the intuition, “that robot can’t do all sorts of human tasks, I guess machines aren’t that smart and this isn’t a big deal yet”.
I am not trying to imply that this is the reasoning you are employing, but it is a type of reasoning I have seen in the wild. If anything, the lack of robustness in current ML systems might actually be more concerning overall, though I am uncertain about this.
Good point, and I agree progress has been slower in robotics compared to the other areas.
I just edited the post to add better examples (DayDreamer, VideoDex and RT-1) of recent robotics advances that are much more impressive than the only one originally cited (Boston Dynamics), thanks to Alexander Kruel who suggested them on Twitter.
Do you have a hypothesis why? Robotic tasks add obvious tangible value, you would expect significant investment into robotics driven by sota AI models. Yet no one appears to be seriously trying and well funded.
IDK what the previous post had in mind, but one possibility is that an AGI with superhuman social and human manipulation capabilities wouldn’t strictly need advanced robotics to take arbitrary physical actions in the world.
This is a something I frequently get hung up on: If the AGI is highly intelligent and socially manipulative, but lacks good motor skills/advanced robotics, doesn’t that imply that it also lacks an important spatial sense necessary to understand, manipulate, or design physical objects? Even if it could manipulate humans to take arbitrarily precise physical actions, it would need pretty good spatial reasoning to know what the expected outcome of those actions is.
I guess the AGI could just solve the problem of human alignment, so our superior motor and engineering skills don’t carelessly bring it to harm.
There are robotics transformers and general purpose models like Gato that can control robotics.
If AGI is extremely close, the reason is criticality. All the pieces for an AGI system that has general capabilities including working memory, robotics control, perception, “scratch” mind spaces including some that can model 3d relationships, exist in separate papers.
Normally it would take humans years, likely decade of methodical work building more complex integrated systems, but current AI may be good enough to bootstrap there in a short time, assuming a very large robotics hardware and compute budget.
Biology perspective here… motor coordination is fiendishly difficult, but humans are unaware of this, because we do not have explicit, conscious knowledge of what is going on there. We have a conscious resolution of something like “throw the ball at that target” “reach the high object” “push over the heavy thing” “stay balanced on the wobbly thing”, and it feels like that is it—because the very advanced system in place to get it done is unconscious, in part utilising parts of the brain that do not make their contents explicit and conscious, in part utilising embodied cognition and bodies carefully evolved for task solving, it involves incredibly quick coordination between surprisingly complicated and fine-tuned systems.
On the other hand, when we solve intellectual problems, like playing chess, or doing math, or speaking in language, a large amount of the information needed to solve the problem is consciously available, and consciously directed. As such, we know far more about these challenges.
This leads us to systematically overestimate how difficult it is to do things like play chess, while it isn’t that difficult, and we know so much about how it works that implementing it in another system is not so hard; and to underestimate how difficult motor coordination is, because we are not aware of the complexity explicitly, which also makes it very difficult to code into another system, especially one that does not run on wetware.
The way we designed computers at first was also strongly influenced by our understanding of our conscious mind, and not by the way wetware evolved to handle first problems, because again, we understood the former better, and it is easier to explicitly encode. So we built systems that were inherently better at the stuff that in humans evolved later, and neglected the stuff we considered basic and that was actually the result of a hell of a long biological evolution.
Which is why, comparatively, our robots doing motor coordination still suck, while problems deemed super hard, like chess, were easily beat a long time ago, and problems still considered inconceivable to solve, like speech, are being solved right now.
With the unfortunate implication that we were hoping for AI to replace menial labour, and we are instead finding that it is replacing intellectual labour, like coding and creative writing.
Which is why, comparatively, our robots doing motor coordination still suck, while problems deemed super hard, like chess, were easily beat a long time ago, and problems still considered inconceivable to solve, like speech, are being solved right now.
With the unfortunate implication that we were hoping for AI to replace menial labour, and we are instead finding that it is replacing intellectual labour, like coding and creative writing.
While I commend your effort put into analysis, I do not think the above is actually remotely correct.
The history of AI has been one of very early use of AI for control systems, including more than 18 years of visible work on autonomous cars. (counting from the 2005 darpa grand challenge)
Easy, tractable results came from this. RL to control a machine is something that has turned out to be extremely easy, and it works very well. (see all the 2014 era DL papers that used atari games as the initial challenge) The issue has been that the required accuracy for a real machine is 99.9%+, with domain specific number of 9s required after that. Making a complete system that reliable has been difficult, you can use the current Cruise stalls as an example where they solved the embedded system control problem very well, but the overall system infrastructure is limiting. (the cars aren’t running people over, but often experience some infrastructure problem with the remote systems)
Comparatively, while the problem of “RL controlling a machine” is very close to being solved, it just is at 99.99% accurate and needs to be at 99.99999% as an illustrative example, chatbots are more like 80% accurate.
They make glaring, overt errors constantly including outright lying - ‘hallucinating’ - something ironically machine control systems don’t do.
And useful chatbots become possible only about 3-5 years ago, and it turns out to take enormous amounts of compute and data, OOMs more than RL systems use, and the current accuracy is low.
Summary: I would argue more it’s that a human perception thing: we think motion control and real world perception is easy, and are not impressed with 99.99% accurate AI systems, and we think higher level cognition is very hard, and are very impressed when we use 80% accurate AI systems.
Mh. I do appreciate the correction, and you do seem to have knowledge here that I do not, but I am not convinced.
Right now, chatbots can perform at levels comparable to humans on writing related tasks that humans actually do. Sure, they hallucinate, they get confused, their spatial reasoning is weak, their theory of mind is weak, etc. but they pass exams with decent grades, write essays that get into newspapers and universities and magazines, pass the Turing test, write a cover letter and correct a CV, etc. Your mileage will vary with whether they outperform a human or act like a pretty shitty human who is transparently an AI, but they are doing comparable things. And notably, the same system is doing all of these things—writing dialogues, writing code, giving advice, generating news articles.
Can you show me a robot that is capable of playing in a football and basketball match? And then dancing a tango with a partner in a crowded room? I am not saying perfectly. It is welcome to be a shitty player, who sometimes trips or misses the ball. 80 % accuracy, if you like. Our ChatBots can be beaten by 9 year old kids at some tasks, so fair enough, let the robot play football and dance with nine year olds, compete with nine year olds. But I want it running, bipedal, across a rough field, kicking a ball into the goal (or at least the approximate direction, like a kid would) with one of the two legs it is running on, while evading players who are trying to snatch the ball away, and without causing anyone severe injury. I want the same robot responding to pressure cues from the dance partner, navigating them around other dancing couples, to the rhythm of the music, holding them enough to give them support without holding them so hard they cause injury. I want the same robot walking into a novel building, and helping with tidying up and cleaning it, identifying stains and chemical bottles, selecting cleaning tools and scrubbing hard enough to get the dirt of without damaging the underlying material while then coating the surface evenly with disinfectant. Correct me if I am wrong—there is so much cool stuff happening in this field so quickly, and a lot of it is simply not remotely my area of expertise. But I am under the impression that we do not have robots who are remotely capable of this.
This is the crazy shit that sensory-motor coordination does. Holding objects hard enough that they do not slip, but without crushing them. Catching flying projectiles, and throwing them at targets, even though they are novel projectiles we have never handled before, and even when the targets are moving. Keeping our balance while bipedal, on uneven and moving ground, and while balancing heavy objects or supporting another person. Staying standing when someone is actively trying to trip you. Entering a novel, messy space, getting oriented, identifying its contents, even if it contains objects we have never seen in this form. Balancing on one leg. Chasing someone through the jungle. I am familiar with projects that have targeted these problems in isolation—heck, I saw the first robot that was capable of playing Jenga, like… nearly two decades ago? But all of this shit in coordination, within a shifting and novel environment?
In comparison, deploying a robot on a clearly marked road with clearly repeating signs, or in the air, is chosing ridiculously easy problems. Akin to programming a software that does not have flexible conversations with you, but is capable of responding to a fixed set of specific prompts with specific responses, and clustering all other prompts into the existing categories or an error.
Part of it is not the difficulty of the task, but many of the tasks you give as examples require very expensive hand built (ironically) robotics hardware to even try them. There are mere hundreds of instances of that hardware, and they are hundreds of thousands of dollars each.
There is insufficient scale. Think of all the AI hype and weak results before labs had clusters of 2048 A100s and trillion token text databases. Scale counts for everything. If in 1880, chemists had figured out how to release energy through fission, but didn’t have enough equipment and money to get weapons grade fissionables until 1944, imagine how bored we would have been with nuclear bomb hype. Nature does not care if you know the answer, only that you have more than a kilogram of refined fissionables, or nothing interesting will happen.
The thing is about your examples is that machines are trivially superhuman in all those tasks. Sure, not at the full set combined, but that’s from lack of trying—nobody has built anything with the necessary scale.
I am sure you have seen the demonstrations of a ball bearing on a rail and an electric motor keeping it balanced, or a double pendulum stabilized by a robot, or quadcopters remaining in flight with 1 wing clipped, using a control algorithm that dynamically adjusts flight after the wing damage.
All easy RL problems, all completely impossible for human beings. (we react too slowly)
The majority of what you mention are straightforward reinforcement learning problems and solvable with a general method. Most robotics manipulation tasks fall into this space.
Note that there is no economic incentive to solve many of the tasks you mention, so they won’t be. But general manufacturing robotics, where you can empty a bin of random parts in front of the machine(s), and they assemble as many fully built products of the design you provided that the parts pile allows? Very solvable and the recent google AI papers show it’s relatively easy. (I say easy because the solutions are not very complex in source code, and relatively small numbers of people are working on them.)
I assume at least for now, everyone will use nice precise industrial robot arms and overhead cameras and lidars mounted in optimal places to view the work space—there is no economic benefit to ‘embodiment’ or a robot janitor entering a building like you describe. Dancing with a partner is too risky.
But it’s not a problem of motion control or sensing, machinery is superhuman in all these ways. It’s a waste of components and compute to give a machine 2 legs or that many DOF. Nobody is going to do that for a while.
Neat paper though one major limitation is they trained from real data from 2 micro kitchens.
To get to very high robotic reliability they needed a simulation of many variations of the robot’s operating environment. And a robot with a second arm and more dexterity on it’s grippers.
Basically the paper was not showing a serious attempt to reach production level reliability, just to tinker with a better technique.
Broadly agree with this post, though I’ll nitpick the inclusion of robotics here. I don’t think it’s progressing nearly as fast as ML, and it seems fairly uncontroversial that we’re not nearly as close to human-level motor control as we are to (say) human-level writing. I only bring this up because a decent chunk of bad reasoning (usually underestimation) I see around AGI risk comes from skepticism about robotics progress, which is mostly irrelevant in my model.
I’m not sure why some skepticism would be unjustified from lack of progress in robots.
Robots require reliability, because otherwise you destroy hardware and other material. Even in areas where we have had enormous progress, (LLMs, Diffusion) we do not have reliability, such that you can trust the output of them without supervision, broadly. So such lack of reliability seems indicative of perhaps some fundamental things yet to be learned.
The skepticism that I object to has less to do with the idea that ML systems are not robust enough to operate robots and more to do with people rationalizing based off of the intrinsic feeling that “robots are not scary enough to justify considering AGI a credible threat”. (Whether they voice this intuition or not)
I agree that having highly capable robots which operate off of ML would be evidence for AGI soon and thus the lack of such robots is evidence in the opposite direction.
That said, because the main threat from AGI that I am concerned about comes from reasoning and planning capabilities, I think it can be somewhat of a red herring. I’m not saying we shouldn’t update on the lack of competent robots, but I am saying that we shouldn’t flippantly use the intuition, “that robot can’t do all sorts of human tasks, I guess machines aren’t that smart and this isn’t a big deal yet”.
I am not trying to imply that this is the reasoning you are employing, but it is a type of reasoning I have seen in the wild. If anything, the lack of robustness in current ML systems might actually be more concerning overall, though I am uncertain about this.
Good point, and I agree progress has been slower in robotics compared to the other areas.
I just edited the post to add better examples (DayDreamer, VideoDex and RT-1) of recent robotics advances that are much more impressive than the only one originally cited (Boston Dynamics), thanks to Alexander Kruel who suggested them on Twitter.
Do you have a hypothesis why? Robotic tasks add obvious tangible value, you would expect significant investment into robotics driven by sota AI models. Yet no one appears to be seriously trying and well funded.
IDK what the previous post had in mind, but one possibility is that an AGI with superhuman social and human manipulation capabilities wouldn’t strictly need advanced robotics to take arbitrary physical actions in the world.
This is a something I frequently get hung up on: If the AGI is highly intelligent and socially manipulative, but lacks good motor skills/advanced robotics, doesn’t that imply that it also lacks an important spatial sense necessary to understand, manipulate, or design physical objects? Even if it could manipulate humans to take arbitrarily precise physical actions, it would need pretty good spatial reasoning to know what the expected outcome of those actions is.
I guess the AGI could just solve the problem of human alignment, so our superior motor and engineering skills don’t carelessly bring it to harm.
There are robotics transformers and general purpose models like Gato that can control robotics.
If AGI is extremely close, the reason is criticality. All the pieces for an AGI system that has general capabilities including working memory, robotics control, perception, “scratch” mind spaces including some that can model 3d relationships, exist in separate papers.
Normally it would take humans years, likely decade of methodical work building more complex integrated systems, but current AI may be good enough to bootstrap there in a short time, assuming a very large robotics hardware and compute budget.
Biology perspective here… motor coordination is fiendishly difficult, but humans are unaware of this, because we do not have explicit, conscious knowledge of what is going on there. We have a conscious resolution of something like “throw the ball at that target” “reach the high object” “push over the heavy thing” “stay balanced on the wobbly thing”, and it feels like that is it—because the very advanced system in place to get it done is unconscious, in part utilising parts of the brain that do not make their contents explicit and conscious, in part utilising embodied cognition and bodies carefully evolved for task solving, it involves incredibly quick coordination between surprisingly complicated and fine-tuned systems.
On the other hand, when we solve intellectual problems, like playing chess, or doing math, or speaking in language, a large amount of the information needed to solve the problem is consciously available, and consciously directed. As such, we know far more about these challenges.
This leads us to systematically overestimate how difficult it is to do things like play chess, while it isn’t that difficult, and we know so much about how it works that implementing it in another system is not so hard; and to underestimate how difficult motor coordination is, because we are not aware of the complexity explicitly, which also makes it very difficult to code into another system, especially one that does not run on wetware.
The way we designed computers at first was also strongly influenced by our understanding of our conscious mind, and not by the way wetware evolved to handle first problems, because again, we understood the former better, and it is easier to explicitly encode. So we built systems that were inherently better at the stuff that in humans evolved later, and neglected the stuff we considered basic and that was actually the result of a hell of a long biological evolution.
Which is why, comparatively, our robots doing motor coordination still suck, while problems deemed super hard, like chess, were easily beat a long time ago, and problems still considered inconceivable to solve, like speech, are being solved right now.
With the unfortunate implication that we were hoping for AI to replace menial labour, and we are instead finding that it is replacing intellectual labour, like coding and creative writing.
Which is why, comparatively, our robots doing motor coordination still suck, while problems deemed super hard, like chess, were easily beat a long time ago, and problems still considered inconceivable to solve, like speech, are being solved right now.
With the unfortunate implication that we were hoping for AI to replace menial labour, and we are instead finding that it is replacing intellectual labour, like coding and creative writing.
While I commend your effort put into analysis, I do not think the above is actually remotely correct.
The history of AI has been one of very early use of AI for control systems, including more than 18 years of visible work on autonomous cars. (counting from the 2005 darpa grand challenge)
Easy, tractable results came from this. RL to control a machine is something that has turned out to be extremely easy, and it works very well. (see all the 2014 era DL papers that used atari games as the initial challenge) The issue has been that the required accuracy for a real machine is 99.9%+, with domain specific number of 9s required after that.
Making a complete system that reliable has been difficult, you can use the current Cruise stalls as an example where they solved the embedded system control problem very well, but the overall system infrastructure is limiting. (the cars aren’t running people over, but often experience some infrastructure problem with the remote systems)
Comparatively, while the problem of “RL controlling a machine” is very close to being solved, it just is at 99.99% accurate and needs to be at 99.99999% as an illustrative example, chatbots are more like 80% accurate.
They make glaring, overt errors constantly including outright lying - ‘hallucinating’ - something ironically machine control systems don’t do.
And useful chatbots become possible only about 3-5 years ago, and it turns out to take enormous amounts of compute and data, OOMs more than RL systems use, and the current accuracy is low.
Summary: I would argue more it’s that a human perception thing: we think motion control and real world perception is easy, and are not impressed with 99.99% accurate AI systems, and we think higher level cognition is very hard, and are very impressed when we use 80% accurate AI systems.
Mh. I do appreciate the correction, and you do seem to have knowledge here that I do not, but I am not convinced.
Right now, chatbots can perform at levels comparable to humans on writing related tasks that humans actually do. Sure, they hallucinate, they get confused, their spatial reasoning is weak, their theory of mind is weak, etc. but they pass exams with decent grades, write essays that get into newspapers and universities and magazines, pass the Turing test, write a cover letter and correct a CV, etc. Your mileage will vary with whether they outperform a human or act like a pretty shitty human who is transparently an AI, but they are doing comparable things. And notably, the same system is doing all of these things—writing dialogues, writing code, giving advice, generating news articles.
Can you show me a robot that is capable of playing in a football and basketball match? And then dancing a tango with a partner in a crowded room? I am not saying perfectly. It is welcome to be a shitty player, who sometimes trips or misses the ball. 80 % accuracy, if you like. Our ChatBots can be beaten by 9 year old kids at some tasks, so fair enough, let the robot play football and dance with nine year olds, compete with nine year olds. But I want it running, bipedal, across a rough field, kicking a ball into the goal (or at least the approximate direction, like a kid would) with one of the two legs it is running on, while evading players who are trying to snatch the ball away, and without causing anyone severe injury. I want the same robot responding to pressure cues from the dance partner, navigating them around other dancing couples, to the rhythm of the music, holding them enough to give them support without holding them so hard they cause injury. I want the same robot walking into a novel building, and helping with tidying up and cleaning it, identifying stains and chemical bottles, selecting cleaning tools and scrubbing hard enough to get the dirt of without damaging the underlying material while then coating the surface evenly with disinfectant. Correct me if I am wrong—there is so much cool stuff happening in this field so quickly, and a lot of it is simply not remotely my area of expertise. But I am under the impression that we do not have robots who are remotely capable of this.
This is the crazy shit that sensory-motor coordination does. Holding objects hard enough that they do not slip, but without crushing them. Catching flying projectiles, and throwing them at targets, even though they are novel projectiles we have never handled before, and even when the targets are moving. Keeping our balance while bipedal, on uneven and moving ground, and while balancing heavy objects or supporting another person. Staying standing when someone is actively trying to trip you. Entering a novel, messy space, getting oriented, identifying its contents, even if it contains objects we have never seen in this form. Balancing on one leg. Chasing someone through the jungle. I am familiar with projects that have targeted these problems in isolation—heck, I saw the first robot that was capable of playing Jenga, like… nearly two decades ago? But all of this shit in coordination, within a shifting and novel environment?
In comparison, deploying a robot on a clearly marked road with clearly repeating signs, or in the air, is chosing ridiculously easy problems. Akin to programming a software that does not have flexible conversations with you, but is capable of responding to a fixed set of specific prompts with specific responses, and clustering all other prompts into the existing categories or an error.
Part of it is not the difficulty of the task, but many of the tasks you give as examples require very expensive hand built (ironically) robotics hardware to even try them. There are mere hundreds of instances of that hardware, and they are hundreds of thousands of dollars each.
There is insufficient scale. Think of all the AI hype and weak results before labs had clusters of 2048 A100s and trillion token text databases. Scale counts for everything. If in 1880, chemists had figured out how to release energy through fission, but didn’t have enough equipment and money to get weapons grade fissionables until 1944, imagine how bored we would have been with nuclear bomb hype. Nature does not care if you know the answer, only that you have more than a kilogram of refined fissionables, or nothing interesting will happen.
The thing is about your examples is that machines are trivially superhuman in all those tasks. Sure, not at the full set combined, but that’s from lack of trying—nobody has built anything with the necessary scale.
I am sure you have seen the demonstrations of a ball bearing on a rail and an electric motor keeping it balanced, or a double pendulum stabilized by a robot, or quadcopters remaining in flight with 1 wing clipped, using a control algorithm that dynamically adjusts flight after the wing damage.
All easy RL problems, all completely impossible for human beings. (we react too slowly)
The majority of what you mention are straightforward reinforcement learning problems and solvable with a general method. Most robotics manipulation tasks fall into this space.
Note that there is no economic incentive to solve many of the tasks you mention, so they won’t be. But general manufacturing robotics, where you can empty a bin of random parts in front of the machine(s), and they assemble as many fully built products of the design you provided that the parts pile allows? Very solvable and the recent google AI papers show it’s relatively easy. (I say easy because the solutions are not very complex in source code, and relatively small numbers of people are working on them.)
I assume at least for now, everyone will use nice precise industrial robot arms and overhead cameras and lidars mounted in optimal places to view the work space—there is no economic benefit to ‘embodiment’ or a robot janitor entering a building like you describe. Dancing with a partner is too risky.
But it’s not a problem of motion control or sensing, machinery is superhuman in all these ways. It’s a waste of components and compute to give a machine 2 legs or that many DOF. Nobody is going to do that for a while.
3 days later...
https://palm-e.github.io/ https://www.lesswrong.com/posts/sMZRKnwZDDy2sAX7K/google-s-palm-e-an-embodied-multimodal-language-model
from the paper: “Data efficiency. Compared to available massive language or vision-language datasets, robotics data is significantly less abundant”
As I was saying, the reason robotics wasn’t as successful as the other tasks is because of scale, and Google seems to hold thisopinion.
I think you can find it interesting: https://ai.googleblog.com/2022/12/rt-1-robotics-transformer-for-real.html?m=1
Neat paper though one major limitation is they trained from real data from 2 micro kitchens.
To get to very high robotic reliability they needed a simulation of many variations of the robot’s operating environment. And a robot with a second arm and more dexterity on it’s grippers.
Basically the paper was not showing a serious attempt to reach production level reliability, just to tinker with a better technique.