The text is very general in its analysis, so some examples would be helpful. Why not start talking about some sets of goals that people who build an optimizing AI system might actually install in it, and see how the AI uses them?
To avoid going too broad, here’s one: “AI genie, abolish hunger in India!”
So the first thing people will complain about with this is that the easiest thing for the system to do is to kill all the Indians in another way. Let’s add:
“Without violating Indian national laws.”
Now lesswrongers will complain that the AI will:
-Set about changing Indian national law in unpredictable ways.
-Violate other country’s laws
-Brilliantly seek loopholes in any system of laws
-Try to become rich too fast
-Escape being useful by finding issues with the definition of “hunger,” such as whether it is OK for someone to briefly become hungry 15 minutes before dinner.
-Build an army and a police force to stop people from getting in the way of this goal.
So, we continue to debug:
“AI, insure that everyone in India is provisioned with at least 1600 calories of healthy (insert elaborate definition of healthy HERE) food each day. Allow them to decline the food in exchange for a small sum of money. Do not break either Indian law or a complex set of other norms and standards for interacting with people and the environment (insert complex set of norms and standards HERE).”
So, we continue to simulate what the AI might do wrong, and patch the problems with more and more sets of specific rules and clauses.
It feels like people will still want to add “stop using resource X/performing behavior Y to pursue this goal if we tell you to,” because people may have other plans for resource X or see problems with behavior Y which are not specified in laws and rules.
People may also want to something like, “ask us (or this designated committee) first before you take steps that are too dramatic (insert definition of dramatic HERE).”
Then, I suppose, the AI may brilliantly anticipate, outsmart, provide legal quid-pro-quos or trick the committee. Some of these activities we would endorse (because the committee is sometimes doing the wrong thing), others we would not.
Thus, we continue to evolve a set of checks and balances on what the AI can and cannot do. Respect for the diverse goals and opinions of people seems to be at the core of this debugging process. However, this respect is not limitless, since people’s goals are often contradictory and sometimes mistaken.
The AI is constantly probing these checks and balances for loopholes and alternative means, just as a set of well-meaning food security NGO workers would do. Unlike them, however, if it finds a loophole it can go through it very quickly and with tremendous power.
Note that if we substitute the word “NGO,” “government” or “corporation” for “AI,” we end up with all of the same set of issues as the AI system has. We deal with this by limiting these organization’s resource level.
We could designate precisely what resources the AI has to meet its goal. That tool might work to a great extent, but the optimizing AI will still continue to try to find loopholes.
We could limit the amount of time the AI has to achieve its goal, try to limit the amount of processing power it can use or the other hardware.
We could develop computer simulations for what would happen if an AI was given a particular set of rules and goals and disallow many options based on these simulations. This is a kind of advanced consequentialism.
Even if the rules and the simulations work very well, each time we give the AI a set of rules for its behavior, there is a non-zero probability of unintended consequences.
Looking on the bright side, however, as we continue this hypothetical debugging process, the probability (perhaps its better to call it a hazard rate) seems to be falling.
Note also that we do get the same problems with governments, NGOs or corporations as well. Perhaps what we are seeking is not perfection but some advantage over existing approaches to organizing groups of people to solve their problems.
Existing structures are themselves hazardous. The threshold for supplementing them with AI is not zero hazard. It is hazard reduction.
The text is very general in its analysis, so some examples would be helpful. Why not start talking about some sets of goals that people who build an optimizing AI system might actually install in it, and see how the AI uses them?
To avoid going too broad, here’s one: “AI genie, abolish hunger in India!”
So the first thing people will complain about with this is that the easiest thing for the system to do is to kill all the Indians in another way. Let’s add:
“Without violating Indian national laws.”
Now lesswrongers will complain that the AI will:
-Set about changing Indian national law in unpredictable ways. -Violate other country’s laws -Brilliantly seek loopholes in any system of laws -Try to become rich too fast -Escape being useful by finding issues with the definition of “hunger,” such as whether it is OK for someone to briefly become hungry 15 minutes before dinner. -Build an army and a police force to stop people from getting in the way of this goal.
So, we continue to debug:
“AI, insure that everyone in India is provisioned with at least 1600 calories of healthy (insert elaborate definition of healthy HERE) food each day. Allow them to decline the food in exchange for a small sum of money. Do not break either Indian law or a complex set of other norms and standards for interacting with people and the environment (insert complex set of norms and standards HERE).”
So, we continue to simulate what the AI might do wrong, and patch the problems with more and more sets of specific rules and clauses.
It feels like people will still want to add “stop using resource X/performing behavior Y to pursue this goal if we tell you to,” because people may have other plans for resource X or see problems with behavior Y which are not specified in laws and rules.
People may also want to something like, “ask us (or this designated committee) first before you take steps that are too dramatic (insert definition of dramatic HERE).”
Then, I suppose, the AI may brilliantly anticipate, outsmart, provide legal quid-pro-quos or trick the committee. Some of these activities we would endorse (because the committee is sometimes doing the wrong thing), others we would not.
Thus, we continue to evolve a set of checks and balances on what the AI can and cannot do. Respect for the diverse goals and opinions of people seems to be at the core of this debugging process. However, this respect is not limitless, since people’s goals are often contradictory and sometimes mistaken.
The AI is constantly probing these checks and balances for loopholes and alternative means, just as a set of well-meaning food security NGO workers would do. Unlike them, however, if it finds a loophole it can go through it very quickly and with tremendous power.
Note that if we substitute the word “NGO,” “government” or “corporation” for “AI,” we end up with all of the same set of issues as the AI system has. We deal with this by limiting these organization’s resource level.
We could designate precisely what resources the AI has to meet its goal. That tool might work to a great extent, but the optimizing AI will still continue to try to find loopholes.
We could limit the amount of time the AI has to achieve its goal, try to limit the amount of processing power it can use or the other hardware.
We could develop computer simulations for what would happen if an AI was given a particular set of rules and goals and disallow many options based on these simulations. This is a kind of advanced consequentialism.
Even if the rules and the simulations work very well, each time we give the AI a set of rules for its behavior, there is a non-zero probability of unintended consequences.
Looking on the bright side, however, as we continue this hypothetical debugging process, the probability (perhaps its better to call it a hazard rate) seems to be falling.
Note also that we do get the same problems with governments, NGOs or corporations as well. Perhaps what we are seeking is not perfection but some advantage over existing approaches to organizing groups of people to solve their problems.
Existing structures are themselves hazardous. The threshold for supplementing them with AI is not zero hazard. It is hazard reduction.