1) Is it because of regulations?
2) Is it because robustness in the real world (or just robustness in general) turns out to be very hard for current AI systems, and robustness is much more important for self driving cars than other areas where we have seen more rapid AI progress?
3) Is it because no one is trying very hard? I.e. AI companies are spending much less compute on self driving car AIs compared to language models and image generators. If this is the case, why? Do they not expect to see a lot of profit from self driving cars?
Some other reason or some combination of the above?
I’m mostly interested in learning to what extent 2 is the cause, since this has implications in AI forecast, both timelines and what trajectories to expect.
It’s mostly that the costs to self-driving failures are much higher than the costs to language models or image generator failures, when GPT-3 says nonsense, you can just shake your head and rerun it, you can’t really do that in your Tesla. If anything I think that much more effort is going into self-driving, it’s just that the statistical tails are really fat here, and there’s a large number of unique situations that you have to get right before deploying them at scale. In short, it’s mostly 2).
Onto this, I’d also add that it seems like deep learning should be able to solve the problem anyhow. Except, in self-driving cars you have limited hardware and strict latency requirements. You might blame this on 3), or just on practical realities, but it means that you can’t throw money at the problem to the same degree you can with LLMs.
Are you saying that the main bottleneck is iterative testing? Because we can’t just let a self driving car loos and see what it will do.
Or are you saying the main bottleneck is that self driving cars have much higher robustness requirements in deployment, which means that the problem it self is much harder?
Or both, in which case, which one do you think is more important?
I suspect that testing is one of the more important bottlenecks.
I suspect that some current systems are safe enough if their caution is dialed up to where they’re annoyingly slow 2% of the time, and that leaves them not quite reliable enough at reaching a destination to be competitive.
I don’t think the bottleneck is iterative testing more importantly than in language/image models, all of them need iterative testing to gather the massive amounts of data required to cover all the edge cases you could care about. Your second statement about robustness seems correct to me, I don’t think that self-driving is especially harder or easier than language modelling in some absolute way, it’s just that the bar to deployment is much higher for cars because mistakes cost a lot more. If you wanted language models to be 99.999% robust to weird bugs, that would also be ridiculously hard. But if you want to hear the opinion of people right next to the problem, here’s Andrej Karpathy, the (former) director of self-driving at Tesla, explaining why self-driving is hard.
It’s a combination of 1 and 2. Which is to say, regulations require a high level of safety, and we don’t have that yet because of point 2.
Robustness is very hard! And in particular perception is very hard when you don’t have a really solid world model, because when the AI sees something new it can react or fail to react to it in surprising ways.
The car companies are working on this by putting the cars through many millions of miles of simulated content so that it has seen most things, but that last mile (heh) problem is still very hard with today’s technology.
You can think of this as an example of why alignment is hard. Perfect driving in a simulated environment doesn’t equal perfect driving in the real world, and while perfection isn’t the goal (being superhuman is the goal), our robust perceptual and modeling systems are very hard to beat, at least for now.
Have you ever drived subconsciously on system 1 autopilot while your conscious system 2 is thinking about something else? But then if there is any hint of danger or something out of the ordinary that shifts your complete system 1 attention on driving. Imagine how unsafe/unrobust human drivers would be without the occasional fulll conscious system 2 focus.
The analogy isn’t perfect, but current DL systems are more like system 1 which doesn’t scale well to the edge case scenarios. There are some rare situations that require complex thought chain reasoning over holistic world model knowledge about human and/or animal behavior, common sense few/zero shot reasoning from limited past experience, etc.
So the fat tail situations may be AGI-complete in terms of difficulty, and the costs of failure are also enormous—so mostly option 2.
Adding a detail to the other comments saying “robustness is hard” – I heard from someone working at a Self Driving Car company that right now, you basically need training data to cover every edge case individually. i.e. the car can drive, and it knows when to break normally, but it doesn’t know about school buses and how those are subtly different. Or it doesn’t know about snow. etc.
So you end up getting specific whitelisted streets that auto-automobiles are allowed to drive on, and need to slowly expand the whitelisted area with exhaustive data and testing.
(But, there are totally specific neighborhoods in SF where self-driving cars are legal and in-use)
My current hypothesis is:
Cheap practical sensors (cameras and, perhaps, radars) more or less require (aligned) AGI for safe operation
Better 3d sensors (lidars), which could, in theory, enable safe driving with existing control theory approaches, are still expensive, impaired by weather and, possibly, interference from other cars with similar sensors, i.e. impractical
No references, but can expand on reasoning if needed
I don’t think that self driving cars is AGI complete problem, but I also have not though a lot about this question. I would appreciate to hear your reasoning why you think this is the case. Or maybe I misunderstood you? In which case I’d appreciate a clarification.
What I meant is self driving *safely* (i.e. at least somewhat safer than humans do currently, including all the edge cases) might be an AGI-complete problem, since:
We know it’s possible for humans
We don’t really know how to provide safety guarantees in the sense of conventional high-safety systems for current NN architectures
Driving safely with cameras likely requires having considerable insight into a lot of societal/game-theoretic issues related to infrastructure and other driver behaviors (e.g. in some cases drivers need to guess a reasonable intent behind incomplete infrastructure or other driver actions, where determining what’s reasonable is the difficult part)
In contrast to this, if we have precise and reliable enough 3d sensors, we can relegate safety to normal physics-based non-NN controllers and safety programming techniques, which we already know how to work with. Problems with such sensors are currently cost and weather resistance
I don’t think computer vision has progressed enough for a good-robust 3d representation of the world (from cameras).
#1 seems like a bit of drag on progress, but to the extent that many regulations also limit liability (or at least reduce charges of negligence), it may actually be net-positive.
#2 IMO is the primary issue.
Successful applications of AI (text generation, chatbots, image manipulation, etc.) just don’t matter if a significant fraction of decisions are wrong—just try again. There are always human supervisors between generator and use. In driving, a mistake hurts someone, or annoys the passenger enough to prefer to pay for a human driver.
#2.5 (a extension of robustness into the physical world) is that operations are VASTLY harder, and the economics are very different too. Pure-information AI is expensive to develop and train, but not that expensive to run. And it’s hard to physically interfere with or steal a chatbot. Driving is expensive to actually execute—it requires expensive sensors and maintenance, so the variable cost to deliver it is much higher, which makes it harder to be usefully (profitably) deployed. And it can be stolen or vandalized, adding to the difficulty of operations. And don’t forget the latency and mobility requirements which make it WAY harder to use shared (cloud) resources for multiple instances.
I like to approach this question by thinking of incentives.
I would add two more elements to your three listed reasons: (1) mature alternatives (2) risk aversion which causes general adoption of a new technologies to take a long time.
While the promise of self-driving cars could bring lots of benefits, the current alternatives are so advanced and refined that there is not a huge urgency to switch to the new alternatives yet.
People and governments resist new technologies and need lots of assurances before the new (driverless cars, etc.) is accepted.
Consider an imperfect example/analogy: Covid vaccines happened way faster than anything like it in history because of the extreme incentives involved—the world had ground to a halt and many thousands of people were dying daily.