Eliezer Yudkowsky comments on Ethics Notes

Eliezer Yudkowsky 22 Oct 2008 14:23 UTC
0 points
McAllister: So, should we think of the injunction as essentially a separate non-reflective AI that monitors the main AI, but which the main AI can’t modify until it’s mature?

No. The presumption and point of an injunction is that you can describe the error condition more simply than the decision system that produces it (and we also assume you have direct access to a cognitive representation of the decision system).

So, for example, forming the positive intention that someone be tortured is recognizable within the representation of the decision system, even if a human looking on from outside couldn’t recognize how the actions currently being undertaken would lead to that end.

Similarly, you would expect a positive intention to bypass the injunction, or reasoning about how to bypass the injunction, to also be recognizable within the system. This wouldn’t mean the injunction would try to swap things around on its own, but a “plan” like suspending the AI to disk can be carried out by simple processes like a water running downhill.

When the AI is revising its own code, this is reasoning on the meta-level. If everything is going as planned, the AI ought to reproduce the meta-reasoning that produces the injunction. The meta-level decision not to reproduce the injunction might be recognizable on the meta-level, though this is more problematic, and it might similarly trigger a controlled shutdown (as opposed to any altered form of reasoning). There is a recursion here that (like all recursions of its class) I don’t know quite yet how to fold together.

There is not an AI inside the AI.

In all cases we presume that the AI knows correctly that it is unfinished—ergo, an AI that is finished has to be able to know that it is finished and then discard all injunctions that rely on the programmers occupying any sort of superior position.

In other words, none of this is for mature superintelligent Friendly AIs, who can work out on their own how to safeguard themselves.

Pearson, the case of discarding precision in order to form accurate approximations does not strike me as self-deception, and your arguments so far haven’t yet forced me to discard the category boundary between “discarding precision in accurate approximations based on resource bounds” and “deceiving yourself to form inaccurate representations based on other consequences”.

Toby, it’s not clear to me to what extent we have a factual disagreement here as opposed to an aesthetic one. To me, statements like “Shut up and do the impossible” seem genuinely profound.

Larry, I agree that a mature AI should have much less need than humans to form hard instrumental boundaries—albeit that it might also have a greater ability to do formal reasoning in cases where a complex justification actually, purely collapses into a simple rule. If you think about it, the point of wanting an injunction in an immature AI’s code is that the AI is incomplete and in the process of being completed by the humans, and the humans who also have a role to play, find it easier to code/verify the injunction than to see the system’s behavior at a glance.