Maybe I’m not being clear about how this would work in an AI!
The ethical injunction isn’t self-protecting, it’s supported within the structural framework of the underlying system. You might even find ethical injunctions starting to emerge without programmer intervention, in some cases, depending on how well the AI understood its own situation.
But the kind of injunctions I have in mind wouldn’t be reflective—they wouldn’t modify the utility function, or kick in at the reflective level to ensure their own propagation. That sounds really scary, to me—there ought to be an injunction against it!
You might have a rule that would controlledly shut down the (non-mature) AI if it tried to execute a certain kind of source code change, but that wouldn’t be the same as having an injunction that exerts direct control over the source code to propagate itself.
To the extent the injunction sticks around in the AI, it should be as the result of ordinary reasoning, not reasoning taking the injunction into account! That would be the wrong kind of circularity; you can unwind past ethical unjunctions!
So, should we think of the injunction as essentially a separate non-reflective AI that monitors the main AI, but which the main AI can’t modify until it’s mature?
If so, that seems to run into all the sorts of problems that you’ve pointed out with trying to hardcode friendly goals into AIs. The foremost problem is that we can’t ensure that the “injunction” AI will indeed shut down the main AI under all those circumstances in which we would want it to. If the main AI learns of the “injunction” AI, it might, in some manner that we didn’t anticipate, discover a way to circumvent it.
The kinds of people whom you’ve criticized might reply, “well, just hard code the injunction AI to shut down the main AI if the main AI tries to circumvent the injunction AI.” But, of course, we can’t anticipate what all such circumventions will look like, so we don’t know how to code the injunction AI to do that. If the main AI is smarter than us, we should expect that it will find circumventions that don’t look like anything that we anticipated.
This has a real analog in human ethical reasoning. You’ve focused on cases where people violate their ethics by convincing themselves that something more important is at stake. But, in my experience, people are also very prone to convincing themselves that they aren’t really violating their ethics. For example, they’ll convince themselves that they aren’t really stealing because the person from whom they stole wasn’t in fact the rightful owner. I’ve heard people who stole from retailers arguing that the retailer acquired the goods by exploiting sweatshops or their own employees, or are just evil corporations, so they never had rightful ownership of the goods in the first place. Hence, the thief reasons, taking the goods isn’t really theft.
So, should we think of the injunction as essentially a separate non-reflective AI that monitors the main AI, but which the main AI can’t modify until it’s mature?
If so, that seems to run into all the sorts of problems that you’ve pointed out with trying to hardcode friendly goals into AIs. The foremost problem is that we can’t ensure that the “injunction” AI will indeed shut down the main AI under all those circumstances in which we would want it to. If the main AI learns of the “injunction” AI, it might, in some manner that we didn’t anticipate, discover a way to circumvent it.
The kinds of people whom you’ve criticized might reply, “well, just hard code the injunction AI to shut down the main AI if the main AI tries to circumvent the injunction AI.” But, of course, we can’t anticipate what all such circumventions will look like, so we don’t know how to code the injunction AI to do that. If the main AI is smarter than us, we should expect that it will find circumventions that don’t look like anything that we anticipated.
This has a real analog in human ethical reasoning. You’ve focused on cases where people violate their ethics by convincing themselves that something more important is at stake. But, in my experience, people are also very prone to convincing themselves that they aren’t really violating their ethics. For example, they’ll convince themselves that they aren’t really stealing because the person from whom they stole wasn’t in fact the rightful owner. I’ve heard people who stole from retailers arguing that the retailer acquired the goods by exploiting sweatshops or their own employees, or are just evil corporations, so they never had rightful ownership of the goods in the first place. Hence, the thief reasons, taking the goods isn’t really theft.
Similarly, your AI might be clever enough to find a way around any hard-coded injunction that will occur to us. So far, this “injunction” strategy sounds to me like trying to develop in advance a fool-proof wish for genies.