An AI should treat a Pascal’s Mugger as an agent trying to arbitrarily gain access to it’s root systems without proper authority, or phrased more simply, an attack.
To explain why, consider this statement in the original article:
But a silicon chip does not look over the code fed to it, assess it for reasonableness, and correct it if not. An AI is not given its code like a human servant given instructions. An AI is its code. What if a philosopher tries Pascal’s Mugging on the AI for a joke, and the tiny probabilities of 3^^^^3 lives being at stake, override everything else in the AI’s calculations? What is the mere Earth at stake, compared to a tiny probability of 3^^^^3 lives?
If something is allowed to override EVERYTHING on a computer, it seems functionally identical to saying that it has root access.
Since Pascal’s Mugging is commonly known and discussed on the internet, having it be equivalent to a root password would be a substantial security hole, like setting your root password to “password”
An AI would presumably have to have some procedure in case someone was attempting unauthorized access. That procedure would need to trigger FIRST, before considering the argument on the merits. Once that procedure is triggered, there argument is no longer being considered on the merits, it is being considered as an attack. Saying “Well, but what if there REALLY ARE 3^^^^3 lives at stake?” seems to be equivalent to saying “Well, but what if the prince of Nigeria REALLY IS trying to give me 1 million dollars according to the email in my spam box?”
An AI should treat a Pascal’s Mugger as an agent trying to arbitrarily gain access to it’s root systems without proper authority, or phrased more simply, an attack.
To explain why, consider this statement in the original article:
If something is allowed to override EVERYTHING on a computer, it seems functionally identical to saying that it has root access.
Since Pascal’s Mugging is commonly known and discussed on the internet, having it be equivalent to a root password would be a substantial security hole, like setting your root password to “password”
An AI would presumably have to have some procedure in case someone was attempting unauthorized access. That procedure would need to trigger FIRST, before considering the argument on the merits. Once that procedure is triggered, there argument is no longer being considered on the merits, it is being considered as an attack. Saying “Well, but what if there REALLY ARE 3^^^^3 lives at stake?” seems to be equivalent to saying “Well, but what if the prince of Nigeria REALLY IS trying to give me 1 million dollars according to the email in my spam box?”