Pilots were trained in roughly the following order:
How to recover from a stall in a small airplane by pushing down on the yoke.
Later, they were trained in simulators for bigger planes and “the practical test standards … called for the loss of altitude in a stall recovery to be less than 100 feet. … More than a hundred feet loss of altitude and you fail.”
And then an airplane crashed when the pilot flying pushed the wrong way on the yoke during a stall, possibly because #2 had trained the pilot’s instincts to try to limit the loss of altitude.
If that was a contributing factor, then the crash is an example of a slightly misaligned intelligence.
It’s not hard to imagine that when training #2 began, everyone assumed that the new training wouldn’t cause the pilot to forget the basics from #1. Or, to use a closer-to-AI perspective, if an AGI seems to be doing the right thing every time, then giving it some extra training can be enough to make it do the wrong thing.
I assume that the pilot’s self-perceived terminal values did not change before #2 and after #2. He probably didn’t go from “I should avoid killing myself” to “Dying in a plane crash is good.” So having a perfect understanding of what the AGI thinks it values might not suffice.
An article about an airplane crash reported an example of over-fitting caused by training in the airline industry.
Pilots were trained in roughly the following order:
How to recover from a stall in a small airplane by pushing down on the yoke.
Later, they were trained in simulators for bigger planes and “the practical test standards … called for the loss of altitude in a stall recovery to be less than 100 feet. … More than a hundred feet loss of altitude and you fail.”
And then an airplane crashed when the pilot flying pushed the wrong way on the yoke during a stall, possibly because #2 had trained the pilot’s instincts to try to limit the loss of altitude.
If that was a contributing factor, then the crash is an example of a slightly misaligned intelligence.
It’s not hard to imagine that when training #2 began, everyone assumed that the new training wouldn’t cause the pilot to forget the basics from #1. Or, to use a closer-to-AI perspective, if an AGI seems to be doing the right thing every time, then giving it some extra training can be enough to make it do the wrong thing.
I assume that the pilot’s self-perceived terminal values did not change before #2 and after #2. He probably didn’t go from “I should avoid killing myself” to “Dying in a plane crash is good.” So having a perfect understanding of what the AGI thinks it values might not suffice.