New article from Oren Etzioni
(Cross-posted from EA Forum.)
This just appeared in this week’s MIT Technology Review: Oren Etzioni, “How to know if AI is about to destroy civilization.” Etzioni is a noted skeptic of AI risk. Here are some things I jotted down:
Etzioni’s key points / arguments:
Warning signs that AGI is coming soon (like canaries in a coal mine, where if they start dying we should get worried)
Automatic formulation of learning problems
Fully self-driving cars
AI doctors
Limited versions of the Turing test (like Winograd Schemas)
If we get to the Turing test itself then it’ll be too late
[Note: I think if we get to practically deployed fully self-driving cars and AI doctors, then we will have already had to solve more limited versions of AI safety. It’s a separate debate whether those solutions would scale up to AGI safety though. We might also get the capabilities without actually being able to deploy them due to safety concerns.]
We are decades away from the versatile abilities of a 5 year old
Preparing anyway even if it’s very low probability because of extreme consequences is Pascal’s Wager
[Note: This is a decision theory question, and I don’t think that’s his area of expertise. I’ve researched PW extensively, and it’s not at all clear to me where to draw the line between low probability—high consequence scenarios that we should be factoring into our decisions, vs. very low probability – very high consequence that we should not factor into our decisions. I’m not sure there is any principled way of drawing a line between those, which might be a problem if it turns out that AI risk is a borderline case.]
If and when a canary “collapses” we will have ample time to design off switches and identify red lines we don’t want AI to cross
“AI eschatology without empirical canaries is a distraction from addressing existing issues like how to regulate AI’s impact on employment or ensure that its use in criminal sentencing or credit scoring doesn’t discriminate against certain groups.”
Agrees with Andrew Ng that it’s too far off to worry about now
But he seems to agree with the following:
If we don’t end up doing anything about it then yes, superintelligence would be incredibly dangerous
If we get to human level AI then superintelligence will be very soon afterwards so it’ll be too late at that point
If it were a lot sooner (as other experts expect) then it sounds like he would agree with the alarmists
Even if it was more than a tiny probability then again it sounds like he’d agree because he wouldn’t consider it Pascal’s Wager
If there’s not ample time between “canaries collapsing” and AGI (as I think other experts expect) then we should be worried a lot sooner
If it wouldn’t distract from other issues like regulating AI’s impact on employment, it sounds like he might agree that it’s reasonable to put some effort into it (although this point is a little less clear)
See also Eliezer Yudkowsky, “There’s no fire alarm for Artificial General Intelligence”
Why? Fully self-driving cars have clear goals of how they should behave. The concern with a self-driving car failing to recognize an object because it snows seem to me irrelevant to the topics we often discuss as AI safety problems like value alignment.
Value alignment is more important subject with AI doctors but still quite different then self-modifying agents.
If you can write down all the goals of a self-driving car in Python, then I expect there’s quite a few companies which would very much like to hire you.
It’s not failing to recognize an object because it snows that’s the problem; it’s deciding what to do when it’s snowing and there’s an unrecognized object. There will always be confusing things all over the place. Even if we had perfect information about the environment, there will still be things in the world which just aren’t categorized by the programmed/learned ontology—there are lots of unusual things in the world. If the car always responds to anything novel by braking, then it’s going to be a slow and frustrating ride very often.
The things-we-want-a-car-to-do are complicated—much like the things-we-want in general. There’s a very wide tail of edge cases, and it’s the edge cases that make the problem hard.
The accident of Telsa and Uber that resulted in deaths were both about not recognizing the object correctly.
Really? An accident where the system noticed something unusual, and then just froze up and waited for a second, is attributed to “not recognizing the object correctly” rather than “not deciding what to do about an unrecognized object correctly”? I mean, sure, recognizing the object would have been a sufficient condition to avoid the accident… but that’s not the real problem here.
There will always be unrecognized objects. A self-driving car which cannot correctly handle unrecognized objects is not safe, and the Uber accident is a great example of that.
I don’t see people complaining about their Tesla’s breaking to often and being to slow/frustrating.
If the object wouldn’t have been a person but anything else that moves the Uber still shouldn’t have crashed into it.
You just need to recognize that there’s an object with mass that happens to move into the lane. I don’t see how that’s a task that needs advanced safety concepts.
I agree that it shouldn’t need advanced safety concepts (i.e. the sort of things on the alignment forum). The things-we-want-a-car-to-do are complicated, but not as complicated as the things-we-want in general. Self-driving cars are not an alignment-complete problem.
But it’s still the case that “don’t crash into things” is a more complicated problem than it seems on the surface. “recognize that there’s an object with mass that happens to move into the lane” isn’t enough; we also need to notice objects which are going to move into the lane; we need trajectory tracking and forecasting. And we need the trajectory-tracker to be robust to the object classification changing (or just being wrong altogether), or sometimes confusing which object is which across timesteps, or reflections or pictures or moving lights, or missing/unreliable lane markers, or things in the world zig-zagging around on strange trajectories, or etc. It’s a task which requires a lot of generalizability and handling of strange things.
That’s the sense in which we need to solve “more limited” versions of AI safety in order to build self-driving cars. We need to be able to engineer reliable AI systems—systems which don’t rely on the real world never being weird in order to work.
But so what? People are not safe; they have slower reaction time than machines, especially when intoxicated. For every example of a self-driving car causing an accident due to object recognition failure, I can point to a person causing an accident due to reaction time failure or attention failure. Why give preference to human failure modes?
You can always come up with arbitrarily contrived edge cases where a narrow AI requires robust value alignment like an AGI (e.g. this ridiculous trolley problem) to behave correctly and thereby reduce any real world narrow AI application to an AGI problem. Thing is, one day China is going to say “Fuck it, we need to get ahead on this AI issue” and just lets loose existing self-driving cars onto their streets; the rest gets sorted out by the insurance market and incremental tech improvements. That’s my prediction of how we’ll transition into self-driving.
To clarify, what I intend to claim is that self-driving cars will not be able to achieve safety comparable to (sober) humans without correctly handling unrecognized objects and other unusual situations. Weird stuff occurs often enough in the real world that handling it will be necessary even for a human-like level of safety.
Is there evidence for this claim? I’ve only ever seen evidence to the contrary
The very first thing in that link:
(emphasis added).
The evidence is: if it were safer in conditions other than the easiest possible conditions, then Tesla would be shouting that fact from the rooftops. Instead, they’re advertising only very limited data about how safe they are in the easy case.
More generally, if unusual situations are the main barrier to full self-driving cars, then we’d expect to see lots of automatic safety features for handling easy cases—automatic braking, cruise control, etc. We do see that, and there’s plenty of evidence that they work great for the easy cases they’re designed for. Tesla’s autopilot is an example of that. But that doesn’t give us self-driving cars; it doesn’t let us take the human out of the loop entirely, and taking the human out of the loop is where the large majority of the value is.
Now, if Tesla (or any self-driving car group) published data showing that autopilot is safer than (sober) humans even in the conditions where most accidents occur, then that would be the sort of thing which would let us take the human out of the loop. That’s the kind of safety we need to actually get the majority of the value from self-driving. I do not see evidence of that, and in this case absence of evidence is pretty strong evidence of absence—because there are companies/groups who would want to share that evidence if they had it.
Elon Musk said a while ago that a fair standard for allowing self-driving cars would be for them to be 10x safer.
Publishing a study that says that Tesla autopilot 10% safer then regular driving wouldn’t be very valuable and there’s huge measurement uncertainty when you have to define what “conditions where most accidents occur” mean.
I would expect us to get that kind of data only once there’s a crash and the automaker wants to convince a jury that the car shouldn’t be blamed.
What makes driving on surface streets so much different than driving on highways such that current state of the art ML techniques wouldn’t be able to handle it with slightly more data and compute?
Unlike natural language processing, AI doctors or household robots, driving seems like a very limited non-AGI-complete task to me because a self-driving car never truly interacts with humans or objects beyond avoiding hitting them.
I would claim all of the above are also required for driving on the highway.
This is secondhand, but… two years ago I worked with a guy who had been on Tesla’s autopilot team. From the sound of it, they stayed in the lane mainly via some hand-coded image processing which looked for a yellow/white strip surrounded by darker color. For most highway driving, that turned out to be good enough.
I’m not sure how much state-of-the-art ML techniques (i.e. deep learning) are even being used for self-driving. I’m sure they’re used for some subtasks, like object recognition, but my (several-years-out-of-date and secondhand) understanding is that current projects aren’t actually using it end-to-end; it’s just specific subcomponents. Slightly more data/compute don’t matter much when key limiting pieces aren’t actually using ML.
That is what I heard about other research groups but a bit surprising coming from Tesla, I’d imagine things have changed dramatically since then considering this video, albeit insufficient as any sort of safety validation, still demonstrates they’re way beyond just following lane markings. According to Musk they’re pushing hard for end-to-end ML solutions. It would make sense seeing the custom hardware they’ve developed and also the data leverage they have with their massive fleet, combined with over-the-air updates.
It’s certainly plausible that things have changed dramatically, although my default guess is that they haven’t—a pile of hacks can go a surprisingly long way, and the only tricky-looking spot I saw in that video was a short section just after 1:30. And Musk saying that they’re “pushing hard for end-to-end ML” is exactly the sort of thing I’d expect to hear if such a project was not actually finding any traction. I’m sure they’re trying to do it, but ML is finicky at the best of times, and I expect we’d hear it shouted from the rooftops if end-to-end self-driving ML was actually starting to work yet.
It would likely depend on whether or not self-driving cars and AI doctors need some form of reinforcement learning to work. If they do, and especially if they need to use online learning, then presumably they will need to at least partially solve issues like safe exploration, distributional shift, avoiding side effects, verification and validation of RL policies, etc. It also seems likely that they would need to solve versions of specification gaming to ensure that the RL agent doesn’t do weird things in edge cases because the reward function wasn’t perfectly specified. Whether or not such partial solutions would scale up to AGI is a different discussion, as I mentioned.
As I said offline to Aryeh, in my mind, this is another example of people agreeing on most of the object level questions. For example, Etzioni’s AI timelines overlap with most of the “alarmists,” but (I assume) he’s predicting the mean, not the worst case or 95% confidence interval for AI arrival. And yes, he disagrees with Eliezer on timelines, but so do most others in the alarmist camp—and he’s not far away from what surveys suggest is the consensus view.
He disagrees about planning the path forward, mostly due to value differences. For example, he doesn’t buy the argument that most of the Effective Altruism / Lesswrong community has suggested that existential risk is a higher priority than almost anything near-term. He also clearly worries much more about over-regulation cutting off AI benefits.
ASI is probably coming sooner or later. Someone has to prepare at some point, the question is when.
I consider AI development to be a field that I have little definite info about. Its hard to assign less than 1% prob to statements about ASI. (Excepting the highly conjoined ones.) I don’t consider things like dinosaur killing asteroids with 1 in 100 million probs to be pascal muggings.
We have a tricky task, and we don’t know how long it will take. Having hit one of these switches doesn’t help us to do the task much. A student is given an assignment in August, the due date is March next year. They decide to put it off until it snows. Snowfall is an indicator that the due date is coming soon, but not a good one. But either way, it doesn’t help you do the assignment.
What is a “fully self driving” car, we have had algorithms that kind of usually work for years, and a substantial part of modern progress in the field looks like gathering more data, and developing driving specific tricks. Suppose that you needed 100 million hours of driving data to train current AI systems. A company pays drivers to put a little recording box in their car. It will take 5 years to gather enough data, and after that we will have self driving cars. What are you going to do in 5 years time that you can’t do now. In reality, we aren’t sure if you need 50, 100 or 500 million hours of driving data with current algorithms, and aren’t sure how many people will want the boxes installed. (These boxes are usually built into satnavs or lane control systems in modern cars)
What percentage do you want, and what will you do when gpt-5 hits it?
A “result” got by focussing on the things that 5 year olds are good at,
Sometimes you have a problem like looking at an image of some everyday scene and saying whats happening in it that 5 year olds are (or at least were a few years ago) much better at. Looking at a load of stock data and using linear regression to find correlations between prices, nothing like that existed in the environment of evolutionary adaptedness, human brains aren’t built to do that.
Even if that was true, how would you know that? Technological progress is hard to predict. Designing off switches is utterly trivial if the system isn’t trying to avoid the off switch being pressed, and actually quite hard if the AI is smart enough to know about the off switch and remove it.
We passed ‘limited variations of the turing test’ some time ago: https://www.penny-arcade.com/comic/2002/10/04
‘Convince a human that he is interacting with a human’ is a low bar. Furthermore, the fully self driving cars are available, just not at an acceptable level of reliability. If we set the bar for reliability as ‘no worse than a texting teenager with a basic license’, it’s probably easily attainable today.
How about we apply performance metrics that would be impossible for a human to achieve to robot drivers and doctors, then move the goalposts every time it looks like they might be hit. This way, we can protect the status quo from disruption while pretending we’re “just being cautious about existential risk”