fully self-driving cars and AI doctors, then we will have already had to solve more limited versions of AI safety
Why? Fully self-driving cars have clear goals of how they should behave. The concern with a self-driving car failing to recognize an object because it snows seem to me irrelevant to the topics we often discuss as AI safety problems like value alignment.
Value alignment is more important subject with AI doctors but still quite different then self-modifying agents.
Fully self-driving cars have clear goals of how they should behave.
If you can write down all the goals of a self-driving car in Python, then I expect there’s quite a few companies which would very much like to hire you.
It’s not failing to recognize an object because it snows that’s the problem; it’s deciding what to do when it’s snowing and there’s an unrecognized object. There will always be confusing things all over the place. Even if we had perfect information about the environment, there will still be things in the world which just aren’t categorized by the programmed/learned ontology—there are lots of unusual things in the world. If the car always responds to anything novel by braking, then it’s going to be a slow and frustrating ride very often.
The things-we-want-a-car-to-do are complicated—much like the things-we-want in general. There’s a very wide tail of edge cases, and it’s the edge cases that make the problem hard.
It’s not failing to recognize an object because it snows that’s the problem; it’s deciding what to do when it’s snowing and there’s an unrecognized object.
The accident of Telsa and Uber that resulted in deaths were both about not recognizing the object correctly.
Really? An accident where the system noticed something unusual, and then just froze up and waited for a second, is attributed to “not recognizing the object correctly” rather than “not deciding what to do about an unrecognized object correctly”? I mean, sure, recognizing the object would have been a sufficient condition to avoid the accident… but that’s not the real problem here.
There will always be unrecognized objects. A self-driving car which cannot correctly handle unrecognized objects is not safe, and the Uber accident is a great example of that.
I don’t see people complaining about their Tesla’s breaking to often and being to slow/frustrating.
If the object wouldn’t have been a person but anything else that moves the Uber still shouldn’t have crashed into it.
You just need to recognize that there’s an object with mass that happens to move into the lane. I don’t see how that’s a task that needs advanced safety concepts.
I agree that it shouldn’t need advanced safety concepts (i.e. the sort of things on the alignment forum). The things-we-want-a-car-to-do are complicated, but not as complicated as the things-we-want in general. Self-driving cars are not an alignment-complete problem.
But it’s still the case that “don’t crash into things” is a more complicated problem than it seems on the surface. “recognize that there’s an object with mass that happens to move into the lane” isn’t enough; we also need to notice objects which are going to move into the lane; we need trajectory tracking and forecasting. And we need the trajectory-tracker to be robust to the object classification changing (or just being wrong altogether), or sometimes confusing which object is which across timesteps, or reflections or pictures or moving lights, or missing/unreliable lane markers, or things in the world zig-zagging around on strange trajectories, or etc. It’s a task which requires a lot of generalizability and handling of strange things.
That’s the sense in which we need to solve “more limited” versions of AI safety in order to build self-driving cars. We need to be able to engineer reliable AI systems—systems which don’t rely on the real world never being weird in order to work.
A self-driving car which cannot correctly handle unrecognized objects is not safe
But so what? People are not safe; they have slower reaction time than machines, especially when intoxicated. For every example of a self-driving car causing an accident due to object recognition failure, I can point to a person causing an accident due to reaction time failure or attention failure. Why give preference to human failure modes?
You can always come up with arbitrarily contrived edge cases where a narrow AI requires robust value alignment like an AGI (e.g. this ridiculous trolley problem) to behave correctly and thereby reduce any real world narrow AI application to an AGI problem. Thing is, one day China is going to say “Fuck it, we need to get ahead on this AI issue” and just lets loose existing self-driving cars onto their streets; the rest gets sorted out by the insurance market and incremental tech improvements. That’s my prediction of how we’ll transition into self-driving.
But so what? People are not safe; they have slower reaction time than machines, especially when intoxicated.
To clarify, what I intend to claim is that self-driving cars will not be able to achieve safety comparable to (sober) humans without correctly handling unrecognized objects and other unusual situations. Weird stuff occurs often enough in the real world that handling it will be necessary even for a human-like level of safety.
For more than a year now, Tesla has been releasing Autopilot safety numbers to show that autopilot is safer than a human driver in average driving conditions. [...] Autopilot is primarily used on highways, which have fewer accidents than surface streets because driving conditions are much simpler.
(emphasis added).
The evidence is: if it were safer in conditions other than the easiest possible conditions, then Tesla would be shouting that fact from the rooftops. Instead, they’re advertising only very limited data about how safe they are in the easy case.
More generally, if unusual situations are the main barrier to full self-driving cars, then we’d expect to see lots of automatic safety features for handling easy cases—automatic braking, cruise control, etc. We do see that, and there’s plenty of evidence that they work great for the easy cases they’re designed for. Tesla’s autopilot is an example of that. But that doesn’t give us self-driving cars; it doesn’t let us take the human out of the loop entirely, and taking the human out of the loop is where the large majority of the value is.
Now, if Tesla (or any self-driving car group) published data showing that autopilot is safer than (sober) humans even in the conditions where most accidents occur, then that would be the sort of thing which would let us take the human out of the loop. That’s the kind of safety we need to actually get the majority of the value from self-driving. I do not see evidence of that, and in this case absence of evidence is pretty strong evidence of absence—because there are companies/groups who would want to share that evidence if they had it.
Elon Musk said a while ago that a fair standard for allowing self-driving cars would be for them to be 10x safer.
Publishing a study that says that Tesla autopilot 10% safer then regular driving wouldn’t be very valuable and there’s huge measurement uncertainty when you have to define what “conditions where most accidents occur” mean.
I would expect us to get that kind of data only once there’s a crash and the automaker wants to convince a jury that the car shouldn’t be blamed.
What makes driving on surface streets so much different than driving on highways such that current state of the art ML techniques wouldn’t be able to handle it with slightly more data and compute?
Unlike natural language processing, AI doctors or household robots, driving seems like a very limited non-AGI-complete task to me because a self-driving car never truly interacts with humans or objects beyond avoiding hitting them.
we also need to notice objects which are going to move into the lane; we need trajectory tracking and forecasting. And we need the trajectory-tracker to be robust to the object classification changing (or just being wrong altogether), or sometimes confusing which object is which across timesteps, or reflections or pictures or moving lights, or missing/unreliable lane markers, or things in the world zig-zagging around on strange trajectories, or etc.
I would claim all of the above are also required for driving on the highway.
This is secondhand, but… two years ago I worked with a guy who had been on Tesla’s autopilot team. From the sound of it, they stayed in the lane mainly via some hand-coded image processing which looked for a yellow/white strip surrounded by darker color. For most highway driving, that turned out to be good enough.
I’m not sure how much state-of-the-art ML techniques (i.e. deep learning) are even being used for self-driving. I’m sure they’re used for some subtasks, like object recognition, but my (several-years-out-of-date and secondhand) understanding is that current projects aren’t actually using it end-to-end; it’s just specific subcomponents. Slightly more data/compute don’t matter much when key limiting pieces aren’t actually using ML.
From the sound of it, they stayed in the lane mainly via some hand-coded image processing which looked for a yellow/white strip surrounded by darker color.
That is what I heard about other research groups but a bit surprising coming from Tesla, I’d imagine things have changed dramatically since then considering this video, albeit insufficient as any sort of safety validation, still demonstrates they’re way beyond just following lane markings. According to Musk they’re pushing hard for end-to-end ML solutions. It would make sense seeing the custom hardware they’ve developed and also the data leverage they have with their massive fleet, combined with over-the-air updates.
It’s certainly plausible that things have changed dramatically, although my default guess is that they haven’t—a pile of hacks can go a surprisingly long way, and the only tricky-looking spot I saw in that video was a short section just after 1:30. And Musk saying that they’re “pushing hard for end-to-end ML” is exactly the sort of thing I’d expect to hear if such a project was not actually finding any traction. I’m sure they’re trying to do it, but ML is finicky at the best of times, and I expect we’d hear it shouted from the rooftops if end-to-end self-driving ML was actually starting to work yet.
It would likely depend on whether or not self-driving cars and AI doctors need some form of reinforcement learning to work. If they do, and especially if they need to use online learning, then presumably they will need to at least partially solve issues like safe exploration, distributional shift, avoiding side effects, verification and validation of RL policies, etc. It also seems likely that they would need to solve versions of specification gaming to ensure that the RL agent doesn’t do weird things in edge cases because the reward function wasn’t perfectly specified. Whether or not such partial solutions would scale up to AGI is a different discussion, as I mentioned.
Why? Fully self-driving cars have clear goals of how they should behave. The concern with a self-driving car failing to recognize an object because it snows seem to me irrelevant to the topics we often discuss as AI safety problems like value alignment.
Value alignment is more important subject with AI doctors but still quite different then self-modifying agents.
If you can write down all the goals of a self-driving car in Python, then I expect there’s quite a few companies which would very much like to hire you.
It’s not failing to recognize an object because it snows that’s the problem; it’s deciding what to do when it’s snowing and there’s an unrecognized object. There will always be confusing things all over the place. Even if we had perfect information about the environment, there will still be things in the world which just aren’t categorized by the programmed/learned ontology—there are lots of unusual things in the world. If the car always responds to anything novel by braking, then it’s going to be a slow and frustrating ride very often.
The things-we-want-a-car-to-do are complicated—much like the things-we-want in general. There’s a very wide tail of edge cases, and it’s the edge cases that make the problem hard.
The accident of Telsa and Uber that resulted in deaths were both about not recognizing the object correctly.
Really? An accident where the system noticed something unusual, and then just froze up and waited for a second, is attributed to “not recognizing the object correctly” rather than “not deciding what to do about an unrecognized object correctly”? I mean, sure, recognizing the object would have been a sufficient condition to avoid the accident… but that’s not the real problem here.
There will always be unrecognized objects. A self-driving car which cannot correctly handle unrecognized objects is not safe, and the Uber accident is a great example of that.
I don’t see people complaining about their Tesla’s breaking to often and being to slow/frustrating.
If the object wouldn’t have been a person but anything else that moves the Uber still shouldn’t have crashed into it.
You just need to recognize that there’s an object with mass that happens to move into the lane. I don’t see how that’s a task that needs advanced safety concepts.
I agree that it shouldn’t need advanced safety concepts (i.e. the sort of things on the alignment forum). The things-we-want-a-car-to-do are complicated, but not as complicated as the things-we-want in general. Self-driving cars are not an alignment-complete problem.
But it’s still the case that “don’t crash into things” is a more complicated problem than it seems on the surface. “recognize that there’s an object with mass that happens to move into the lane” isn’t enough; we also need to notice objects which are going to move into the lane; we need trajectory tracking and forecasting. And we need the trajectory-tracker to be robust to the object classification changing (or just being wrong altogether), or sometimes confusing which object is which across timesteps, or reflections or pictures or moving lights, or missing/unreliable lane markers, or things in the world zig-zagging around on strange trajectories, or etc. It’s a task which requires a lot of generalizability and handling of strange things.
That’s the sense in which we need to solve “more limited” versions of AI safety in order to build self-driving cars. We need to be able to engineer reliable AI systems—systems which don’t rely on the real world never being weird in order to work.
But so what? People are not safe; they have slower reaction time than machines, especially when intoxicated. For every example of a self-driving car causing an accident due to object recognition failure, I can point to a person causing an accident due to reaction time failure or attention failure. Why give preference to human failure modes?
You can always come up with arbitrarily contrived edge cases where a narrow AI requires robust value alignment like an AGI (e.g. this ridiculous trolley problem) to behave correctly and thereby reduce any real world narrow AI application to an AGI problem. Thing is, one day China is going to say “Fuck it, we need to get ahead on this AI issue” and just lets loose existing self-driving cars onto their streets; the rest gets sorted out by the insurance market and incremental tech improvements. That’s my prediction of how we’ll transition into self-driving.
To clarify, what I intend to claim is that self-driving cars will not be able to achieve safety comparable to (sober) humans without correctly handling unrecognized objects and other unusual situations. Weird stuff occurs often enough in the real world that handling it will be necessary even for a human-like level of safety.
Is there evidence for this claim? I’ve only ever seen evidence to the contrary
The very first thing in that link:
(emphasis added).
The evidence is: if it were safer in conditions other than the easiest possible conditions, then Tesla would be shouting that fact from the rooftops. Instead, they’re advertising only very limited data about how safe they are in the easy case.
More generally, if unusual situations are the main barrier to full self-driving cars, then we’d expect to see lots of automatic safety features for handling easy cases—automatic braking, cruise control, etc. We do see that, and there’s plenty of evidence that they work great for the easy cases they’re designed for. Tesla’s autopilot is an example of that. But that doesn’t give us self-driving cars; it doesn’t let us take the human out of the loop entirely, and taking the human out of the loop is where the large majority of the value is.
Now, if Tesla (or any self-driving car group) published data showing that autopilot is safer than (sober) humans even in the conditions where most accidents occur, then that would be the sort of thing which would let us take the human out of the loop. That’s the kind of safety we need to actually get the majority of the value from self-driving. I do not see evidence of that, and in this case absence of evidence is pretty strong evidence of absence—because there are companies/groups who would want to share that evidence if they had it.
Elon Musk said a while ago that a fair standard for allowing self-driving cars would be for them to be 10x safer.
Publishing a study that says that Tesla autopilot 10% safer then regular driving wouldn’t be very valuable and there’s huge measurement uncertainty when you have to define what “conditions where most accidents occur” mean.
I would expect us to get that kind of data only once there’s a crash and the automaker wants to convince a jury that the car shouldn’t be blamed.
What makes driving on surface streets so much different than driving on highways such that current state of the art ML techniques wouldn’t be able to handle it with slightly more data and compute?
Unlike natural language processing, AI doctors or household robots, driving seems like a very limited non-AGI-complete task to me because a self-driving car never truly interacts with humans or objects beyond avoiding hitting them.
I would claim all of the above are also required for driving on the highway.
This is secondhand, but… two years ago I worked with a guy who had been on Tesla’s autopilot team. From the sound of it, they stayed in the lane mainly via some hand-coded image processing which looked for a yellow/white strip surrounded by darker color. For most highway driving, that turned out to be good enough.
I’m not sure how much state-of-the-art ML techniques (i.e. deep learning) are even being used for self-driving. I’m sure they’re used for some subtasks, like object recognition, but my (several-years-out-of-date and secondhand) understanding is that current projects aren’t actually using it end-to-end; it’s just specific subcomponents. Slightly more data/compute don’t matter much when key limiting pieces aren’t actually using ML.
That is what I heard about other research groups but a bit surprising coming from Tesla, I’d imagine things have changed dramatically since then considering this video, albeit insufficient as any sort of safety validation, still demonstrates they’re way beyond just following lane markings. According to Musk they’re pushing hard for end-to-end ML solutions. It would make sense seeing the custom hardware they’ve developed and also the data leverage they have with their massive fleet, combined with over-the-air updates.
It’s certainly plausible that things have changed dramatically, although my default guess is that they haven’t—a pile of hacks can go a surprisingly long way, and the only tricky-looking spot I saw in that video was a short section just after 1:30. And Musk saying that they’re “pushing hard for end-to-end ML” is exactly the sort of thing I’d expect to hear if such a project was not actually finding any traction. I’m sure they’re trying to do it, but ML is finicky at the best of times, and I expect we’d hear it shouted from the rooftops if end-to-end self-driving ML was actually starting to work yet.
It would likely depend on whether or not self-driving cars and AI doctors need some form of reinforcement learning to work. If they do, and especially if they need to use online learning, then presumably they will need to at least partially solve issues like safe exploration, distributional shift, avoiding side effects, verification and validation of RL policies, etc. It also seems likely that they would need to solve versions of specification gaming to ensure that the RL agent doesn’t do weird things in edge cases because the reward function wasn’t perfectly specified. Whether or not such partial solutions would scale up to AGI is a different discussion, as I mentioned.