Further to RobbBB’s highly useful reply (you should read it), I would like to point out that the phenomenon of optimization processes having unexpected or unusual behaviour when placed in new situations or given new capabilities is quite general. It applies to pretty much any strong optimization process working toward any goal.
For example, you program an AI/robot with no resources to fetch a banana as quickly as possible in some sandbox environment, where the banana is guarded by a few people. The AI will describe its strategy as talking to the guards and convincing them to give it the banana. Or perhaps buying the banana off them if you give it some money. But there are many ways to change the situation:
Make the AI much smarter—it will realise that by taking a certain sequence of steps it can get from here to the banana without ever moving within someone’s field of view. No need to pay for it. (1.5—Make the AI even smarter, and it will realise that by painting a Basilisk on the robot’s front it can disable all the guards effortlessly and take a straight line to the banana.)
Give the AI a teleportation device—it teleports straight to the banana.
Give the AI a plasma cannon—it shoots all the guards dead and runs up and takes the banana.
The point is… it’s not at all weird for AI behaviour to be “inconsistent”. It isn’t a sign of anything being broken, in fact the goal is being achieved. The AI is just able to think of more effective ways to do it then you are. That is, after all, the point of superintelligence. And an AI that does this is not broken or stupid, and is certainly capable of being dangerous.
By the way, you can try to do something like this:
[ And by the way: one important feature that is OBVIOUSLY going to be in the goalX code is this: that the outcome of any actions that the goalX code prescribes, should always be checked to see if they are as consistent as possible with the verbal description of the class of results X, and if any inconsistency occurs the goalX code should be deemed defective, and be shut down for adjustment.]
But, to start with I have no idea how you would program this or what it means formally, but even if you could, it takes human judgement to identify “inconsistencies” that would matter to humans. Without embedding human values in there you’ll have the AI shut down every time it tries to do anything new, or use a stronger criterion of “inconsistency” and miss a few cases where the AI does something you actually don’t want.
Or, you know, the AI will deduce that the full “verbal description of the class of results X” (which is an infinite list) is of course defined by its goal (ie. the goalX code) and therefore reason that nothing the goalX code can do will be inconsistent with it.
Further to RobbBB’s highly useful reply (you should read it), I would like to point out that the phenomenon of optimization processes having unexpected or unusual behaviour when placed in new situations or given new capabilities is quite general. It applies to pretty much any strong optimization process working toward any goal.
For example, you program an AI/robot with no resources to fetch a banana as quickly as possible in some sandbox environment, where the banana is guarded by a few people. The AI will describe its strategy as talking to the guards and convincing them to give it the banana. Or perhaps buying the banana off them if you give it some money. But there are many ways to change the situation:
Make the AI much smarter—it will realise that by taking a certain sequence of steps it can get from here to the banana without ever moving within someone’s field of view. No need to pay for it. (1.5—Make the AI even smarter, and it will realise that by painting a Basilisk on the robot’s front it can disable all the guards effortlessly and take a straight line to the banana.)
Give the AI a teleportation device—it teleports straight to the banana.
Give the AI a plasma cannon—it shoots all the guards dead and runs up and takes the banana.
The point is… it’s not at all weird for AI behaviour to be “inconsistent”. It isn’t a sign of anything being broken, in fact the goal is being achieved. The AI is just able to think of more effective ways to do it then you are. That is, after all, the point of superintelligence. And an AI that does this is not broken or stupid, and is certainly capable of being dangerous.
By the way, you can try to do something like this:
But, to start with I have no idea how you would program this or what it means formally, but even if you could, it takes human judgement to identify “inconsistencies” that would matter to humans. Without embedding human values in there you’ll have the AI shut down every time it tries to do anything new, or use a stronger criterion of “inconsistency” and miss a few cases where the AI does something you actually don’t want.
Or, you know, the AI will deduce that the full “verbal description of the class of results X” (which is an infinite list) is of course defined by its goal (ie. the goalX code) and therefore reason that nothing the goalX code can do will be inconsistent with it.