I somehow agree with both you and OP, and also I don’t buy part of the lever analogy yet. It seems important that the levers not only look similar, but that they be close to each other, in order to expect users to reliably mess up. Similarly, strong tool AI will offer many, many affordances, and it isn’t clear how ″close″ I should expect them to be in use-space. From the security mindset, that’s sufficient cause for serious concern, but I’m still trying to shake out the expected value estimate for powerful tool AIs—will they be thermonuclear-weapon-like (as in your post), or will mistakes generally look different?
One way in which the analogy breaks down: in the lever case, we have two levers right next to each other, and each does something we want—it’s just easy to confuse the levers. A better analogy for AI might be: many levers and switches and dials have to be set to get the behavior we want, and mistakes in some of them matter while others don’t, and we don’t know which ones matter when. And sometimes people will figure out that a particular combination extends the flaps, so they’ll say “do this to extend the flaps”, except that when some other switch has the wrong setting and it’s between 4 and 5 pm on Thursday that combination will still extend the flaps, but it will also retract the landing gear, and nobody noticed that before they wrote down the instructions for how to extend the flaps.
Some features which this analogy better highlights:
Most of the interface-space does things we either don’t care about or actively do not want
Even among things which usually look like they do what we want, most do something we don’t want at least some of the time
The system has a lot of dimensions, we can’t brute-force check all combinations, and problems may be in seemingly-unrelated dimensions
I somehow agree with both you and OP, and also I don’t buy part of the lever analogy yet. It seems important that the levers not only look similar, but that they be close to each other, in order to expect users to reliably mess up. Similarly, strong tool AI will offer many, many affordances, and it isn’t clear how ″close″ I should expect them to be in use-space. From the security mindset, that’s sufficient cause for serious concern, but I’m still trying to shake out the expected value estimate for powerful tool AIs—will they be thermonuclear-weapon-like (as in your post), or will mistakes generally look different?
One way in which the analogy breaks down: in the lever case, we have two levers right next to each other, and each does something we want—it’s just easy to confuse the levers. A better analogy for AI might be: many levers and switches and dials have to be set to get the behavior we want, and mistakes in some of them matter while others don’t, and we don’t know which ones matter when. And sometimes people will figure out that a particular combination extends the flaps, so they’ll say “do this to extend the flaps”, except that when some other switch has the wrong setting and it’s between 4 and 5 pm on Thursday that combination will still extend the flaps, but it will also retract the landing gear, and nobody noticed that before they wrote down the instructions for how to extend the flaps.
Some features which this analogy better highlights:
Most of the interface-space does things we either don’t care about or actively do not want
Even among things which usually look like they do what we want, most do something we don’t want at least some of the time
The system has a lot of dimensions, we can’t brute-force check all combinations, and problems may be in seemingly-unrelated dimensions