The point about an AI being better than a human on average, but the worst case AI being much worse than a human seems like a critical insight! I haven’t thought about it this way, but it matches what Eliezer keeps saying about AI dangers: AI is much more dangerous “out of distribution”. A human can reasonably reliably figure out a special case where “222+222=555″ even though 1+1=2, but an AI trained on a distribution will likely insist on a bad “out of distribution” action, and occasionally have a spectacular but preventable fatality when used as FSD autopilot, without a human justification of “The driver was drunk/distracted/enraged”.
One nitpick on the estimates: a fatality may not be the best way to evaluate reliability: humans would probably have a lot more errors and near-misses than an AI (“oh sh*t, I drifted into a wrong lane!”), but fewer spectacular crashes (or brains sliced through, or civilians accidentally shot by fully autonomous drones).
I read somewhere that pilots made like one error every 6 minutes—but they have time and ability to detect the errors and correct them.
Quick OODA loop is very effective in detecting and eliminating errors and its is a core for human safety. But it requires environments which provides quick feedback for errors before fatal crash. Like “this car is too close, I will drive slowly”.
The point about an AI being better than a human on average, but the worst case AI being much worse than a human seems like a critical insight! I haven’t thought about it this way, but it matches what Eliezer keeps saying about AI dangers: AI is much more dangerous “out of distribution”. A human can reasonably reliably figure out a special case where “222+222=555″ even though 1+1=2, but an AI trained on a distribution will likely insist on a bad “out of distribution” action, and occasionally have a spectacular but preventable fatality when used as FSD autopilot, without a human justification of “The driver was drunk/distracted/enraged”.
One nitpick on the estimates: a fatality may not be the best way to evaluate reliability: humans would probably have a lot more errors and near-misses than an AI (“oh sh*t, I drifted into a wrong lane!”), but fewer spectacular crashes (or brains sliced through, or civilians accidentally shot by fully autonomous drones).
I read somewhere that pilots made like one error every 6 minutes—but they have time and ability to detect the errors and correct them.
Quick OODA loop is very effective in detecting and eliminating errors and its is a core for human safety. But it requires environments which provides quick feedback for errors before fatal crash. Like “this car is too close, I will drive slowly”.