I agree that focus should be on preventing the existence of a Sovereign AI that seeks to harm people (as opposed to trying to deal with such an AI after it has already been built). The main reason for trying to find necessary features, is actually that it might stop a dangerous AI project from being pursued in the first place. In particular: it might convince the design team to abandon an AI project, that clearly lacks a feature that has been found to be necessary. An AI project that would (if successfully implemented) result in an AI Sovereign that would seek to harm people. For example a Sovereign AI that wants to respect a Membrane. But where the Membrane formalism does not actually prevent the AI from wanting to hurt individuals, because the formalism lacks a necessary feature.
One reason we might end up with a Sovereign AI that seeks to harm people is that someone makes two separate errors. Let’s say that Bob gains control over a tool-AI, and uses it to shut down unauthorised AI projects (Bob might for example be a single individual, or a design team, or a government, or a coalition of governments, or the UN, or a democratic world government, or something else along those lines). Bob gains the ability to launch a Sovereign AI. And Bob settles on a specific Sovereign AI design: Bob’s Sovereign AI (BSAI).
Bob knows that BSAI might contain a hidden flaw. And Bob is not being completely reckless about launching BSAI. So Bob designs a Membrane, whose function is to protect individuals (in case BSAI does have a hidden flaw). And Bob figures out how to make sure that BSAI will want to avoid piercing this Membrane (in other words: Bob makes sure that the Membrane will be internal to BSAI).
Consider the case where both BSAI, and the Membrane formalism in question, each have a hidden flaw. If both BSAI and the Membrane is successfully implemented, then the result would be a Sovereign AI that seeks to harm people (the resulting AI would want to both, (i): harm people, and (ii): respect the Membrane of every individual). One way to reduce the probability that such a project would go ahead, is to describe necessary features.
For example: if it is clear that the Membrane that Bob is planning to use, does not have the necessary Extended Membrane feature described in the post, then Bob should be able to see that this Membrane will not offer reliable protection from BSAI (which Bob knows might be needed, because Bob knows that BSAI might be flawed).
For a given AI project, it is not certain that there exists a realistically findable necessary feature, that can be used to illustrate the dangers of the project in question. And even if such a feature is found, it is not certain that Bob will listen. But looking for necessary features is still a tractable way of reducing the probability of a Sovereign AI that seeks to harm people.
A project to find necessary features is not really a quest for a solution to AI. It is more informative to see such a project as analogous to a quest to design a bulletproof vest for Bob, who will be going into a gunfight (and who might decide to put on the vest). Even if very successful, the bulletproof vest project will not offer full protection (Bob might get shot in the head). A vest is also not a solution. Whether Bob is a medic trying to evacuate wounded people from the gunfight, or Bob is a soldier trying to win the gunfight, the vest cannot be used to achieve Bob’s objective. Vests are not solutions. Vests are still very popular amongst people who know that they will be going into a gunfight.
So if you will share the fate of Bob. And if you might fail to persuade Bob to avoid a gunfight. Then it makes sense to try to design a bulletproof vest for Bob (because if you succeed, then he might decide to wear it. And that would be very good if he ends up getting shot in the stomach). (the vest in this analogy is analogous to descriptions of necessary features, that might be used to convince designers to abandon a dangerous AI project. The vest in this analogy is not analogous to a Membrane)
I agree that focus should be on preventing the existence of a Sovereign AI that seeks to harm people (as opposed to trying to deal with such an AI after it has already been built). The main reason for trying to find necessary features, is actually that it might stop a dangerous AI project from being pursued in the first place. In particular: it might convince the design team to abandon an AI project, that clearly lacks a feature that has been found to be necessary. An AI project that would (if successfully implemented) result in an AI Sovereign that would seek to harm people. For example a Sovereign AI that wants to respect a Membrane. But where the Membrane formalism does not actually prevent the AI from wanting to hurt individuals, because the formalism lacks a necessary feature.
One reason we might end up with a Sovereign AI that seeks to harm people is that someone makes two separate errors. Let’s say that Bob gains control over a tool-AI, and uses it to shut down unauthorised AI projects (Bob might for example be a single individual, or a design team, or a government, or a coalition of governments, or the UN, or a democratic world government, or something else along those lines). Bob gains the ability to launch a Sovereign AI. And Bob settles on a specific Sovereign AI design: Bob’s Sovereign AI (BSAI).
Bob knows that BSAI might contain a hidden flaw. And Bob is not being completely reckless about launching BSAI. So Bob designs a Membrane, whose function is to protect individuals (in case BSAI does have a hidden flaw). And Bob figures out how to make sure that BSAI will want to avoid piercing this Membrane (in other words: Bob makes sure that the Membrane will be internal to BSAI).
Consider the case where both BSAI, and the Membrane formalism in question, each have a hidden flaw. If both BSAI and the Membrane is successfully implemented, then the result would be a Sovereign AI that seeks to harm people (the resulting AI would want to both, (i): harm people, and (ii): respect the Membrane of every individual). One way to reduce the probability that such a project would go ahead, is to describe necessary features.
For example: if it is clear that the Membrane that Bob is planning to use, does not have the necessary Extended Membrane feature described in the post, then Bob should be able to see that this Membrane will not offer reliable protection from BSAI (which Bob knows might be needed, because Bob knows that BSAI might be flawed).
For a given AI project, it is not certain that there exists a realistically findable necessary feature, that can be used to illustrate the dangers of the project in question. And even if such a feature is found, it is not certain that Bob will listen. But looking for necessary features is still a tractable way of reducing the probability of a Sovereign AI that seeks to harm people.
A project to find necessary features is not really a quest for a solution to AI. It is more informative to see such a project as analogous to a quest to design a bulletproof vest for Bob, who will be going into a gunfight (and who might decide to put on the vest). Even if very successful, the bulletproof vest project will not offer full protection (Bob might get shot in the head). A vest is also not a solution. Whether Bob is a medic trying to evacuate wounded people from the gunfight, or Bob is a soldier trying to win the gunfight, the vest cannot be used to achieve Bob’s objective. Vests are not solutions. Vests are still very popular amongst people who know that they will be going into a gunfight.
So if you will share the fate of Bob. And if you might fail to persuade Bob to avoid a gunfight. Then it makes sense to try to design a bulletproof vest for Bob (because if you succeed, then he might decide to wear it. And that would be very good if he ends up getting shot in the stomach). (the vest in this analogy is analogous to descriptions of necessary features, that might be used to convince designers to abandon a dangerous AI project. The vest in this analogy is not analogous to a Membrane)