I can hail a robotaxi. So can anyone living in San Francisco, Phoenix, Beijing, Shanghai, Guangzhou, Shenzhen, Chongqing or Wuhan. The barriers to wider rollout are political and regulatory, not technological.
Waymo cars, and I believe Apollo and Cruise as well, are “level 4” autonomous vehicles, i.e. there is no human involved in driving them whatsoever. There is a “human available to provide assistance” in roughly the same sense that a member of the American Automobile Association has an offsite human available to provide assistance in case of crashes, flat tires, etc.
I don’t see any reason to think AGI is imminent but this particular argument against it doesn’t go through. Robotaxi tech is very good and improving swiftly.
Cruise has remote human operators who intervene every 2.5 to 5 miles, according to a recent NYT article. Dunno about Waymo or Apollo. That does not sound like “roughly the same sense as AAA.”
Even Teslas still have human interventions in something like half of rides. (So, way better than Cruise but still far from good enough)
Kyle Vogt responded to the New York Times article. He claims 2.5 to 5 miles is the rate that Cruise vehicles request help from remote operators, not the rate that they actually get help. Vogt doesn’t say what the rate they actually get help is.
I’m a bit sus. If that number were so much better than the 2.5-5 miles number cited by the times, why wouldn’t he come out and say it?
There is probably a big difference between “doing something reliably well” and “doing something, hit and miss”.
I wonder if this will also be true for the first AGIs. Maybe they will greatly surpass humans in all areas randomly, and will occasionally do something utterly stupid in any area. And the probability of doing the stupid thing will slowly decrease, while the ability of doing something superhumanly awesome will already be there.
That alone sounds scary, even ignoring all the usual worries about alignment.
Thanks. Your post makes point #3 from my post, and it makes two additional points I’ll call #5 and #6:
Onboard compute for Teslas, which is a constraint on model size, is tightly limited, whereas LLMs that live in the cloud don’t have to worry nearly as much about the physical space they take up, the cost of the hardware, or their power consumption.
Self-driving cars don’t get to learn through trial-and-error and become gradually more reliable, whereas LLMs do.
Re: (5), I wonder why the economics of, say, making a ChatGPT Plus subscription profitable wouldn’t constrain inference compute for GPT-4 just as much as for a Tesla.
Re: (6), Tesla customers acting as safety drivers for the “Full Self-Driving Capability” software seems like it contradicts this point.
Re (5): That might be plausible a priori, but a posteriori it seems that people are willing to pay for GPT4 despite it being way bigger/more-expensive than a tesla HW3 or HW4 computer can handle. Moreover you can make an AI system that is bigger still, and train it, and hope that it’ll pay for itself later (this worked for GPT3 and GPT4); you physically can’t put a bigger AI on your fleet of Teslas, the hardware doesn’t support it, and the idea of rebuilding the cars to have 10-100x bigger onboard computers is laughable.
To emphasize point 5 more, think about all the progress in the field of AI that has come from scaling. Think about how dumb GPT-4 would be if it was only the size of GPT-2. (go to r/locallama, scroll around, maybe download some 2B parameter models and play around with them). Scaling is arguably the biggest source of progress in AI these days… and robotaxis are mostly unable to benefit from it. (Well, they can and do scale data, but they can’t scale parameters very much.)
Re (6): It only partially contradicts the point; IMO the point still stands (though not as strongly as it would without that data!). The data Tesla gets from customers is mostly of the form “AI did X, customer intervened, no crash occurred” with a smattering of “Customer was driving, and a crash occurred.” There are precious few datapoints of “AI did X, causing crash.” Like, I’m not even sure there is 100 of them. Now obviously “customer intervened” is a proxy for “AI did something dangerous” but only a very poor proxy since customers intervene all the time when they get nervous, the vast majority of interventions are unnecessary. And customer crashes are nice but still pretty rare, and anyhow learning from others’ mistakes just isn’t as good as learning from your own.
I think the main hope here is to learn to avoid crashes in simulation, and then transfer/generalize to reality. That’s what Tesla is trying and the other companies too I think. If it works, then great, data problem solved. But sim-2-real is tricky and not a fully solved problem I think (though definitely partially solved? It seems to at least somewhat work in a bunch of domains.)
Overall I think (5) is my main argument, I think (6) and (3) are weaker.
I wrote about this previously here. I think you have to break it down by company; the answer for why they’re not globally available is different for the different companies.
For Waymo, they have self-driving taxis in SF and Phoenix without safety drivers. They use LIDAR, so instead of the cognitive task of driving as a human would solve it, they have substituted the easier task “driving but your eyes are laser rangefinders”. The reason they haven’t scaled to cover every city, or at least more cities, is unclear to me; the obvious possibilities are that the LIDAR sensors and onboard computers are impractically expensive, that they have a surprisingly high manual-override and there’s a big unscalable call center somewhere, or they’re being cowardly and trying to maintain zero fatalities forever (at scales where a comparable fleet of human-driven taxis would definitely have some fatalities). In any case, I don’t think the software/neural nets are likely to be the bottleneck.
For Tesla, until recently, they were using surprisingly-low-resolution cameras. So instead of the cognitive task of driving as a human would solve it, they substituted the harder task “driving with a vision impairment and no glasses”. They did upgrade the cameras within the past year, but it’s hard to tell how much of the customer feedback represents the current hardware version vs. past versions; sites like FSDBeta Community Tracker don’t really distinguish. It also seems likely that their onboard GPUs are underpowered relative to the task.
As for Cruise, Comma.ai, and others—well, distance-to-AGI is measured only from the market leader, and just as GPT-4, Claude and Bard have a long tail of inferior models by other orgs trailing behind them, you also expect a long tail of self-driving systems with worse disengagement rates than the leaders.
Insufficient onboard processing power? Tesla’s HW3 computer is about 70Tflops, ~0.1% of estimated 100Pops human brain. Approximately equivalent to a mouse brain. Social and predator mammals that have to model and predict conspecific and prey behaviors have brains that generally start at about 2% human for cats and up to 10% for wolves.
I posit that driving adequately requires modelling, interpreting and anticipating other road users behaviors to deal with a substantial number of problematic and dangerous situations; like noticing erratic/non-rule compliant behavior in pedestrians and other road users and adjusting response, or negotiating rule-breaking at intersections or obstructions with other road users, and that is orders of magnitude harder than strict physics and road rule adherence. This is the inevitable edge-case issue that there are simply too many odd possibilities to train for and greater innate comprehension of human behaviors is needed.
Perhaps phone-a-smarter-AI-friend escalation of the outside-of-context problems encountered can deal with much of this, but I think cars need a lot more on board smarts to do a satisfactory job. A cat to small-dog level H100 @4Pflops FP8 might eventually be up to the task.
Elon’s recent announcements about end-to-end neural nets processing for V12.xx FSD release being the savior and solution appears to not have panned out—there’s been months of silence after initial announcement. That may have been their last toss of the die for HW3 hardware. Their next iteration HW4 chip being fitted to new cars is apparently ~8x faster at 600Tflops.[edit, the source I found appears unreliable, specs not yet public] They really wouldn’t want to do that if they didn’t think it was necessary.
As usual you fall onto the trap of neglecting the engineering and social organisation problems and the time required to solve them. We don’t need AGI for autonomous car, it will just take time.
I mean that AGI and autonomous cars are orthogonal problems, especially because autonomous car requires solving engineering issues (which have been discussed by other commentators) which are different from the software issues. It’s quite usual here on less wrong to handwave the engineering away once the theoretical problem is solved.
I think the use of the term “AGI” without a specific definition is causing an issue here—IMHO the crux of the matter is the difference between the progress in average performance vs worst-case performance. We are having amazing progress in the former, but struggling with the latter (LLM hallucinations, etc). And robotaxis require an almost-perfect performance.
All the tasks that are about using creativity to solve problems are okay with performance that isn’t perfect. Scientists make a lot of mistakes but those things that scientists get right are producing a lot of value.
I can hail a robotaxi. So can anyone living in San Francisco, Phoenix, Beijing, Shanghai, Guangzhou, Shenzhen, Chongqing or Wuhan. The barriers to wider rollout are political and regulatory, not technological.
Waymo cars, and I believe Apollo and Cruise as well, are “level 4” autonomous vehicles, i.e. there is no human involved in driving them whatsoever. There is a “human available to provide assistance” in roughly the same sense that a member of the American Automobile Association has an offsite human available to provide assistance in case of crashes, flat tires, etc.
I don’t see any reason to think AGI is imminent but this particular argument against it doesn’t go through. Robotaxi tech is very good and improving swiftly.
Cruise has remote human operators who intervene every 2.5 to 5 miles, according to a recent NYT article. Dunno about Waymo or Apollo. That does not sound like “roughly the same sense as AAA.”
Even Teslas still have human interventions in something like half of rides. (So, way better than Cruise but still far from good enough)
Kyle Vogt responded to the New York Times article. He claims 2.5 to 5 miles is the rate that Cruise vehicles request help from remote operators, not the rate that they actually get help. Vogt doesn’t say what the rate they actually get help is.
I’m a bit sus. If that number were so much better than the 2.5-5 miles number cited by the times, why wouldn’t he come out and say it?
There is probably a big difference between “doing something reliably well” and “doing something, hit and miss”.
I wonder if this will also be true for the first AGIs. Maybe they will greatly surpass humans in all areas randomly, and will occasionally do something utterly stupid in any area. And the probability of doing the stupid thing will slowly decrease, while the ability of doing something superhumanly awesome will already be there.
That alone sounds scary, even ignoring all the usual worries about alignment.
Without intending to, I wrote a response to your question a few months ago: AGI is easier than robotaxis
Thanks. Your post makes point #3 from my post, and it makes two additional points I’ll call #5 and #6:
Onboard compute for Teslas, which is a constraint on model size, is tightly limited, whereas LLMs that live in the cloud don’t have to worry nearly as much about the physical space they take up, the cost of the hardware, or their power consumption.
Self-driving cars don’t get to learn through trial-and-error and become gradually more reliable, whereas LLMs do.
Re: (5), I wonder why the economics of, say, making a ChatGPT Plus subscription profitable wouldn’t constrain inference compute for GPT-4 just as much as for a Tesla.
Re: (6), Tesla customers acting as safety drivers for the “Full Self-Driving Capability” software seems like it contradicts this point.
Curious to hear your thoughts.
Nice.
Re (5): That might be plausible a priori, but a posteriori it seems that people are willing to pay for GPT4 despite it being way bigger/more-expensive than a tesla HW3 or HW4 computer can handle. Moreover you can make an AI system that is bigger still, and train it, and hope that it’ll pay for itself later (this worked for GPT3 and GPT4); you physically can’t put a bigger AI on your fleet of Teslas, the hardware doesn’t support it, and the idea of rebuilding the cars to have 10-100x bigger onboard computers is laughable.
To emphasize point 5 more, think about all the progress in the field of AI that has come from scaling. Think about how dumb GPT-4 would be if it was only the size of GPT-2. (go to r/locallama, scroll around, maybe download some 2B parameter models and play around with them). Scaling is arguably the biggest source of progress in AI these days… and robotaxis are mostly unable to benefit from it. (Well, they can and do scale data, but they can’t scale parameters very much.)
Re (6): It only partially contradicts the point; IMO the point still stands (though not as strongly as it would without that data!). The data Tesla gets from customers is mostly of the form “AI did X, customer intervened, no crash occurred” with a smattering of “Customer was driving, and a crash occurred.” There are precious few datapoints of “AI did X, causing crash.” Like, I’m not even sure there is 100 of them. Now obviously “customer intervened” is a proxy for “AI did something dangerous” but only a very poor proxy since customers intervene all the time when they get nervous, the vast majority of interventions are unnecessary. And customer crashes are nice but still pretty rare, and anyhow learning from others’ mistakes just isn’t as good as learning from your own.
I think the main hope here is to learn to avoid crashes in simulation, and then transfer/generalize to reality. That’s what Tesla is trying and the other companies too I think. If it works, then great, data problem solved. But sim-2-real is tricky and not a fully solved problem I think (though definitely partially solved? It seems to at least somewhat work in a bunch of domains.)
Overall I think (5) is my main argument, I think (6) and (3) are weaker.
I wrote about this previously here. I think you have to break it down by company; the answer for why they’re not globally available is different for the different companies.
For Waymo, they have self-driving taxis in SF and Phoenix without safety drivers. They use LIDAR, so instead of the cognitive task of driving as a human would solve it, they have substituted the easier task “driving but your eyes are laser rangefinders”. The reason they haven’t scaled to cover every city, or at least more cities, is unclear to me; the obvious possibilities are that the LIDAR sensors and onboard computers are impractically expensive, that they have a surprisingly high manual-override and there’s a big unscalable call center somewhere, or they’re being cowardly and trying to maintain zero fatalities forever (at scales where a comparable fleet of human-driven taxis would definitely have some fatalities). In any case, I don’t think the software/neural nets are likely to be the bottleneck.
For Tesla, until recently, they were using surprisingly-low-resolution cameras. So instead of the cognitive task of driving as a human would solve it, they substituted the harder task “driving with a vision impairment and no glasses”. They did upgrade the cameras within the past year, but it’s hard to tell how much of the customer feedback represents the current hardware version vs. past versions; sites like FSDBeta Community Tracker don’t really distinguish. It also seems likely that their onboard GPUs are underpowered relative to the task.
As for Cruise, Comma.ai, and others—well, distance-to-AGI is measured only from the market leader, and just as GPT-4, Claude and Bard have a long tail of inferior models by other orgs trailing behind them, you also expect a long tail of self-driving systems with worse disengagement rates than the leaders.
Insufficient onboard processing power? Tesla’s HW3 computer is about 70Tflops, ~0.1% of estimated 100Pops human brain. Approximately equivalent to a mouse brain. Social and predator mammals that have to model and predict conspecific and prey behaviors have brains that generally start at about 2% human for cats and up to 10% for wolves.
I posit that driving adequately requires modelling, interpreting and anticipating other road users behaviors to deal with a substantial number of problematic and dangerous situations; like noticing erratic/non-rule compliant behavior in pedestrians and other road users and adjusting response, or negotiating rule-breaking at intersections or obstructions with other road users, and that is orders of magnitude harder than strict physics and road rule adherence. This is the inevitable edge-case issue that there are simply too many odd possibilities to train for and greater innate comprehension of human behaviors is needed.
Perhaps phone-a-smarter-AI-friend escalation of the outside-of-context problems encountered can deal with much of this, but I think cars need a lot more on board smarts to do a satisfactory job. A cat to small-dog level H100 @4Pflops FP8 might eventually be up to the task.
Elon’s recent announcements about end-to-end neural nets processing for V12.xx FSD release being the savior and solution appears to not have panned out—there’s been months of silence after initial announcement. That may have been their last toss of the die for HW3 hardware.
Their next iteration HW4 chip being fitted to new cars is apparently ~8x faster at 600Tflops.[edit, the source I found appears unreliable, specs not yet public] They really wouldn’t want to do that if they didn’t think it was necessary.Helpful comment that gives lots to think about. Thanks!
As usual you fall onto the trap of neglecting the engineering and social organisation problems and the time required to solve them. We don’t need AGI for autonomous car, it will just take time.
What do you mean?
I mean that AGI and autonomous cars are orthogonal problems, especially because autonomous car requires solving engineering issues (which have been discussed by other commentators) which are different from the software issues. It’s quite usual here on less wrong to handwave the engineering away once the theoretical problem is solved.
My argument (which I’m not all that sure I believe myself):
You need computation done “on-board,” not in the cloud, and that requirement drives up prices to an order of magnitude too high to be feasible.
I think the use of the term “AGI” without a specific definition is causing an issue here—IMHO the crux of the matter is the difference between the progress in average performance vs worst-case performance. We are having amazing progress in the former, but struggling with the latter (LLM hallucinations, etc). And robotaxis require an almost-perfect performance.
That begs the question: do AGIs not require an almost-perfect performance?
All the tasks that are about using creativity to solve problems are okay with performance that isn’t perfect. Scientists make a lot of mistakes but those things that scientists get right are producing a lot of value.