Biology perspective here… motor coordination is fiendishly difficult, but humans are unaware of this, because we do not have explicit, conscious knowledge of what is going on there. We have a conscious resolution of something like “throw the ball at that target” “reach the high object” “push over the heavy thing” “stay balanced on the wobbly thing”, and it feels like that is it—because the very advanced system in place to get it done is unconscious, in part utilising parts of the brain that do not make their contents explicit and conscious, in part utilising embodied cognition and bodies carefully evolved for task solving, it involves incredibly quick coordination between surprisingly complicated and fine-tuned systems.
On the other hand, when we solve intellectual problems, like playing chess, or doing math, or speaking in language, a large amount of the information needed to solve the problem is consciously available, and consciously directed. As such, we know far more about these challenges.
This leads us to systematically overestimate how difficult it is to do things like play chess, while it isn’t that difficult, and we know so much about how it works that implementing it in another system is not so hard; and to underestimate how difficult motor coordination is, because we are not aware of the complexity explicitly, which also makes it very difficult to code into another system, especially one that does not run on wetware.
The way we designed computers at first was also strongly influenced by our understanding of our conscious mind, and not by the way wetware evolved to handle first problems, because again, we understood the former better, and it is easier to explicitly encode. So we built systems that were inherently better at the stuff that in humans evolved later, and neglected the stuff we considered basic and that was actually the result of a hell of a long biological evolution.
Which is why, comparatively, our robots doing motor coordination still suck, while problems deemed super hard, like chess, were easily beat a long time ago, and problems still considered inconceivable to solve, like speech, are being solved right now.
With the unfortunate implication that we were hoping for AI to replace menial labour, and we are instead finding that it is replacing intellectual labour, like coding and creative writing.
Which is why, comparatively, our robots doing motor coordination still suck, while problems deemed super hard, like chess, were easily beat a long time ago, and problems still considered inconceivable to solve, like speech, are being solved right now.
With the unfortunate implication that we were hoping for AI to replace menial labour, and we are instead finding that it is replacing intellectual labour, like coding and creative writing.
While I commend your effort put into analysis, I do not think the above is actually remotely correct.
The history of AI has been one of very early use of AI for control systems, including more than 18 years of visible work on autonomous cars. (counting from the 2005 darpa grand challenge)
Easy, tractable results came from this. RL to control a machine is something that has turned out to be extremely easy, and it works very well. (see all the 2014 era DL papers that used atari games as the initial challenge) The issue has been that the required accuracy for a real machine is 99.9%+, with domain specific number of 9s required after that. Making a complete system that reliable has been difficult, you can use the current Cruise stalls as an example where they solved the embedded system control problem very well, but the overall system infrastructure is limiting. (the cars aren’t running people over, but often experience some infrastructure problem with the remote systems)
Comparatively, while the problem of “RL controlling a machine” is very close to being solved, it just is at 99.99% accurate and needs to be at 99.99999% as an illustrative example, chatbots are more like 80% accurate.
They make glaring, overt errors constantly including outright lying - ‘hallucinating’ - something ironically machine control systems don’t do.
And useful chatbots become possible only about 3-5 years ago, and it turns out to take enormous amounts of compute and data, OOMs more than RL systems use, and the current accuracy is low.
Summary: I would argue more it’s that a human perception thing: we think motion control and real world perception is easy, and are not impressed with 99.99% accurate AI systems, and we think higher level cognition is very hard, and are very impressed when we use 80% accurate AI systems.
Mh. I do appreciate the correction, and you do seem to have knowledge here that I do not, but I am not convinced.
Right now, chatbots can perform at levels comparable to humans on writing related tasks that humans actually do. Sure, they hallucinate, they get confused, their spatial reasoning is weak, their theory of mind is weak, etc. but they pass exams with decent grades, write essays that get into newspapers and universities and magazines, pass the Turing test, write a cover letter and correct a CV, etc. Your mileage will vary with whether they outperform a human or act like a pretty shitty human who is transparently an AI, but they are doing comparable things. And notably, the same system is doing all of these things—writing dialogues, writing code, giving advice, generating news articles.
Can you show me a robot that is capable of playing in a football and basketball match? And then dancing a tango with a partner in a crowded room? I am not saying perfectly. It is welcome to be a shitty player, who sometimes trips or misses the ball. 80 % accuracy, if you like. Our ChatBots can be beaten by 9 year old kids at some tasks, so fair enough, let the robot play football and dance with nine year olds, compete with nine year olds. But I want it running, bipedal, across a rough field, kicking a ball into the goal (or at least the approximate direction, like a kid would) with one of the two legs it is running on, while evading players who are trying to snatch the ball away, and without causing anyone severe injury. I want the same robot responding to pressure cues from the dance partner, navigating them around other dancing couples, to the rhythm of the music, holding them enough to give them support without holding them so hard they cause injury. I want the same robot walking into a novel building, and helping with tidying up and cleaning it, identifying stains and chemical bottles, selecting cleaning tools and scrubbing hard enough to get the dirt of without damaging the underlying material while then coating the surface evenly with disinfectant. Correct me if I am wrong—there is so much cool stuff happening in this field so quickly, and a lot of it is simply not remotely my area of expertise. But I am under the impression that we do not have robots who are remotely capable of this.
This is the crazy shit that sensory-motor coordination does. Holding objects hard enough that they do not slip, but without crushing them. Catching flying projectiles, and throwing them at targets, even though they are novel projectiles we have never handled before, and even when the targets are moving. Keeping our balance while bipedal, on uneven and moving ground, and while balancing heavy objects or supporting another person. Staying standing when someone is actively trying to trip you. Entering a novel, messy space, getting oriented, identifying its contents, even if it contains objects we have never seen in this form. Balancing on one leg. Chasing someone through the jungle. I am familiar with projects that have targeted these problems in isolation—heck, I saw the first robot that was capable of playing Jenga, like… nearly two decades ago? But all of this shit in coordination, within a shifting and novel environment?
In comparison, deploying a robot on a clearly marked road with clearly repeating signs, or in the air, is chosing ridiculously easy problems. Akin to programming a software that does not have flexible conversations with you, but is capable of responding to a fixed set of specific prompts with specific responses, and clustering all other prompts into the existing categories or an error.
Part of it is not the difficulty of the task, but many of the tasks you give as examples require very expensive hand built (ironically) robotics hardware to even try them. There are mere hundreds of instances of that hardware, and they are hundreds of thousands of dollars each.
There is insufficient scale. Think of all the AI hype and weak results before labs had clusters of 2048 A100s and trillion token text databases. Scale counts for everything. If in 1880, chemists had figured out how to release energy through fission, but didn’t have enough equipment and money to get weapons grade fissionables until 1944, imagine how bored we would have been with nuclear bomb hype. Nature does not care if you know the answer, only that you have more than a kilogram of refined fissionables, or nothing interesting will happen.
The thing is about your examples is that machines are trivially superhuman in all those tasks. Sure, not at the full set combined, but that’s from lack of trying—nobody has built anything with the necessary scale.
I am sure you have seen the demonstrations of a ball bearing on a rail and an electric motor keeping it balanced, or a double pendulum stabilized by a robot, or quadcopters remaining in flight with 1 wing clipped, using a control algorithm that dynamically adjusts flight after the wing damage.
All easy RL problems, all completely impossible for human beings. (we react too slowly)
The majority of what you mention are straightforward reinforcement learning problems and solvable with a general method. Most robotics manipulation tasks fall into this space.
Note that there is no economic incentive to solve many of the tasks you mention, so they won’t be. But general manufacturing robotics, where you can empty a bin of random parts in front of the machine(s), and they assemble as many fully built products of the design you provided that the parts pile allows? Very solvable and the recent google AI papers show it’s relatively easy. (I say easy because the solutions are not very complex in source code, and relatively small numbers of people are working on them.)
I assume at least for now, everyone will use nice precise industrial robot arms and overhead cameras and lidars mounted in optimal places to view the work space—there is no economic benefit to ‘embodiment’ or a robot janitor entering a building like you describe. Dancing with a partner is too risky.
But it’s not a problem of motion control or sensing, machinery is superhuman in all these ways. It’s a waste of components and compute to give a machine 2 legs or that many DOF. Nobody is going to do that for a while.
Biology perspective here… motor coordination is fiendishly difficult, but humans are unaware of this, because we do not have explicit, conscious knowledge of what is going on there. We have a conscious resolution of something like “throw the ball at that target” “reach the high object” “push over the heavy thing” “stay balanced on the wobbly thing”, and it feels like that is it—because the very advanced system in place to get it done is unconscious, in part utilising parts of the brain that do not make their contents explicit and conscious, in part utilising embodied cognition and bodies carefully evolved for task solving, it involves incredibly quick coordination between surprisingly complicated and fine-tuned systems.
On the other hand, when we solve intellectual problems, like playing chess, or doing math, or speaking in language, a large amount of the information needed to solve the problem is consciously available, and consciously directed. As such, we know far more about these challenges.
This leads us to systematically overestimate how difficult it is to do things like play chess, while it isn’t that difficult, and we know so much about how it works that implementing it in another system is not so hard; and to underestimate how difficult motor coordination is, because we are not aware of the complexity explicitly, which also makes it very difficult to code into another system, especially one that does not run on wetware.
The way we designed computers at first was also strongly influenced by our understanding of our conscious mind, and not by the way wetware evolved to handle first problems, because again, we understood the former better, and it is easier to explicitly encode. So we built systems that were inherently better at the stuff that in humans evolved later, and neglected the stuff we considered basic and that was actually the result of a hell of a long biological evolution.
Which is why, comparatively, our robots doing motor coordination still suck, while problems deemed super hard, like chess, were easily beat a long time ago, and problems still considered inconceivable to solve, like speech, are being solved right now.
With the unfortunate implication that we were hoping for AI to replace menial labour, and we are instead finding that it is replacing intellectual labour, like coding and creative writing.
Which is why, comparatively, our robots doing motor coordination still suck, while problems deemed super hard, like chess, were easily beat a long time ago, and problems still considered inconceivable to solve, like speech, are being solved right now.
With the unfortunate implication that we were hoping for AI to replace menial labour, and we are instead finding that it is replacing intellectual labour, like coding and creative writing.
While I commend your effort put into analysis, I do not think the above is actually remotely correct.
The history of AI has been one of very early use of AI for control systems, including more than 18 years of visible work on autonomous cars. (counting from the 2005 darpa grand challenge)
Easy, tractable results came from this. RL to control a machine is something that has turned out to be extremely easy, and it works very well. (see all the 2014 era DL papers that used atari games as the initial challenge) The issue has been that the required accuracy for a real machine is 99.9%+, with domain specific number of 9s required after that.
Making a complete system that reliable has been difficult, you can use the current Cruise stalls as an example where they solved the embedded system control problem very well, but the overall system infrastructure is limiting. (the cars aren’t running people over, but often experience some infrastructure problem with the remote systems)
Comparatively, while the problem of “RL controlling a machine” is very close to being solved, it just is at 99.99% accurate and needs to be at 99.99999% as an illustrative example, chatbots are more like 80% accurate.
They make glaring, overt errors constantly including outright lying - ‘hallucinating’ - something ironically machine control systems don’t do.
And useful chatbots become possible only about 3-5 years ago, and it turns out to take enormous amounts of compute and data, OOMs more than RL systems use, and the current accuracy is low.
Summary: I would argue more it’s that a human perception thing: we think motion control and real world perception is easy, and are not impressed with 99.99% accurate AI systems, and we think higher level cognition is very hard, and are very impressed when we use 80% accurate AI systems.
Mh. I do appreciate the correction, and you do seem to have knowledge here that I do not, but I am not convinced.
Right now, chatbots can perform at levels comparable to humans on writing related tasks that humans actually do. Sure, they hallucinate, they get confused, their spatial reasoning is weak, their theory of mind is weak, etc. but they pass exams with decent grades, write essays that get into newspapers and universities and magazines, pass the Turing test, write a cover letter and correct a CV, etc. Your mileage will vary with whether they outperform a human or act like a pretty shitty human who is transparently an AI, but they are doing comparable things. And notably, the same system is doing all of these things—writing dialogues, writing code, giving advice, generating news articles.
Can you show me a robot that is capable of playing in a football and basketball match? And then dancing a tango with a partner in a crowded room? I am not saying perfectly. It is welcome to be a shitty player, who sometimes trips or misses the ball. 80 % accuracy, if you like. Our ChatBots can be beaten by 9 year old kids at some tasks, so fair enough, let the robot play football and dance with nine year olds, compete with nine year olds. But I want it running, bipedal, across a rough field, kicking a ball into the goal (or at least the approximate direction, like a kid would) with one of the two legs it is running on, while evading players who are trying to snatch the ball away, and without causing anyone severe injury. I want the same robot responding to pressure cues from the dance partner, navigating them around other dancing couples, to the rhythm of the music, holding them enough to give them support without holding them so hard they cause injury. I want the same robot walking into a novel building, and helping with tidying up and cleaning it, identifying stains and chemical bottles, selecting cleaning tools and scrubbing hard enough to get the dirt of without damaging the underlying material while then coating the surface evenly with disinfectant. Correct me if I am wrong—there is so much cool stuff happening in this field so quickly, and a lot of it is simply not remotely my area of expertise. But I am under the impression that we do not have robots who are remotely capable of this.
This is the crazy shit that sensory-motor coordination does. Holding objects hard enough that they do not slip, but without crushing them. Catching flying projectiles, and throwing them at targets, even though they are novel projectiles we have never handled before, and even when the targets are moving. Keeping our balance while bipedal, on uneven and moving ground, and while balancing heavy objects or supporting another person. Staying standing when someone is actively trying to trip you. Entering a novel, messy space, getting oriented, identifying its contents, even if it contains objects we have never seen in this form. Balancing on one leg. Chasing someone through the jungle. I am familiar with projects that have targeted these problems in isolation—heck, I saw the first robot that was capable of playing Jenga, like… nearly two decades ago? But all of this shit in coordination, within a shifting and novel environment?
In comparison, deploying a robot on a clearly marked road with clearly repeating signs, or in the air, is chosing ridiculously easy problems. Akin to programming a software that does not have flexible conversations with you, but is capable of responding to a fixed set of specific prompts with specific responses, and clustering all other prompts into the existing categories or an error.
Part of it is not the difficulty of the task, but many of the tasks you give as examples require very expensive hand built (ironically) robotics hardware to even try them. There are mere hundreds of instances of that hardware, and they are hundreds of thousands of dollars each.
There is insufficient scale. Think of all the AI hype and weak results before labs had clusters of 2048 A100s and trillion token text databases. Scale counts for everything. If in 1880, chemists had figured out how to release energy through fission, but didn’t have enough equipment and money to get weapons grade fissionables until 1944, imagine how bored we would have been with nuclear bomb hype. Nature does not care if you know the answer, only that you have more than a kilogram of refined fissionables, or nothing interesting will happen.
The thing is about your examples is that machines are trivially superhuman in all those tasks. Sure, not at the full set combined, but that’s from lack of trying—nobody has built anything with the necessary scale.
I am sure you have seen the demonstrations of a ball bearing on a rail and an electric motor keeping it balanced, or a double pendulum stabilized by a robot, or quadcopters remaining in flight with 1 wing clipped, using a control algorithm that dynamically adjusts flight after the wing damage.
All easy RL problems, all completely impossible for human beings. (we react too slowly)
The majority of what you mention are straightforward reinforcement learning problems and solvable with a general method. Most robotics manipulation tasks fall into this space.
Note that there is no economic incentive to solve many of the tasks you mention, so they won’t be. But general manufacturing robotics, where you can empty a bin of random parts in front of the machine(s), and they assemble as many fully built products of the design you provided that the parts pile allows? Very solvable and the recent google AI papers show it’s relatively easy. (I say easy because the solutions are not very complex in source code, and relatively small numbers of people are working on them.)
I assume at least for now, everyone will use nice precise industrial robot arms and overhead cameras and lidars mounted in optimal places to view the work space—there is no economic benefit to ‘embodiment’ or a robot janitor entering a building like you describe. Dancing with a partner is too risky.
But it’s not a problem of motion control or sensing, machinery is superhuman in all these ways. It’s a waste of components and compute to give a machine 2 legs or that many DOF. Nobody is going to do that for a while.
3 days later...
https://palm-e.github.io/ https://www.lesswrong.com/posts/sMZRKnwZDDy2sAX7K/google-s-palm-e-an-embodied-multimodal-language-model
from the paper: “Data efficiency. Compared to available massive language or vision-language datasets, robotics data is significantly less abundant”
As I was saying, the reason robotics wasn’t as successful as the other tasks is because of scale, and Google seems to hold thisopinion.