Slider comments on Concept Safety: What are concepts for, and how to deal with alien concepts

Slider 20 Apr 2015 12:10 UTC
0 points
At some point children believe that their parents make the sun raise in the morning. This worldview is more morally involved than learning some technical things about physical occupancy.

The concept of sovereign airspace is a real thing. Nations do expect permission and there are regulations concerning following the air control etc. Now this works on a differnt level than private property but is an example where previous concept of “owning the land” extends to “owning the air” if in the private property case it doesn’t.

We do not tell an AI that “classically you ought to stay in the box”. We do some conrete talking to it or coding to it that we abbriviate. When extending the concept to new conceptual areas the AI falls on back on the details. When a scientist newtonian sense of location is destroyed he needs to reformulate what he means by space. I could imagine that the “extensions safety” of various ways of telling an ai to stay in a box could vary a lot. If I stay “We need to know what space you can affect so we ask you to stay within the marked area over there” if there is an ostensive act in there we actually do not refer to any newtonian sense of space. Additionally we give the AI tools to let us know if he is elsewhere. So when he expans the concepts behind the order it has basis on understanding the psyhological dimension of the marking. “If I understand ‘markings’ this way does it still make the programmer know were I am if I follow that sense of marking?”. Now there is a certain style of directing that adds a whole lot of these safety valves. But it would still be hard and cumbersome to be sure that “every safety valve” has been included in the direction.

On human to human interaction we do not expect to take each other literally all the time. In the military the breakdown of the reguired “roundings” is called “white mutiny”. If you start to do things really by the book nothing gets really done. In the AI case it would be really handy if the AI could detect that the asking it to sit in the corner is not supposed to hang on the details of newtonian space. But if we allow it to interpret it’s commands it’s harder to outrule it interpreting them malevolantly. However it could be used as a safety feature too. If a human says “I am going to kill you” we are inclined to interpret it as a joke or rhetorical emphasis not as an actual plan of murder if no addiotional evidence to that direction is given. In that way if a AI would “refuse to understand” a command to blow up the world it in most cases would be the desired behaviour.