This forces the AI to avoid certain actions that may be highly efficient but potentially ethically disastrous.
If the most efficient actions are ethically disastrous, then we have a fundamental problem which seems to me to be unrelated to AI safety, and which no AI control proposal will address. For example, if the most efficient strategy is to build a virus that kills everyone but you, and an AI is considering this strategy but has to reject it because it is unethical, then we are just out of luck. We could call this a problem with “AI,” but it’s really a problem with biotechnology.
If a certain kind of reinforcement learning is especially efficient but morally unacceptable, then that seems to be the same situation. What are we supposed to do, other than either accept the moral cost or adopt a good enough social solution to overcome the efficiency gap? What kind of solution might you hope to find that would make this kind of problem go away?
If the efficient actions merely might be ethically disastrous, then I guess the cost is supposed to be the time required to clarify the overseer’s values. Which brings us to:
Another example might be that the AI faces a lot of everyday ethical questions in the course of acquiring resources, and has to take the latency hit of asking the overseer about them every time.
The question is just how many distinct questions of this form there are, and how important they are to the AI’s plans. If there were merely a billion such questions it doesn’t seem like a big deal at all (though then a significant occupation of humans would be answering moral questions).
Even that strikes me as completely implausible given our experience so far (combined with my inability to see many future examples). If I were the user, and people were trying to optimize values using the range of policies available today, then it seems like they would have had to ask me no more than a dozen or so questions to get things basically right (i.e. realizing much more than 99% of the potential value from my perspective). So this seems to require moral problems to proliferate at a much faster rate than technological problems.
Do you disagree about the importance of hard ethical questions in the situation today (e.g. I am implicitly overlooking many important issues because I’m not used to dealing with an AI), or do you just expect more proliferation in the future?
Also, the problem of predicting human moral judgments doesn’t seem to be radically harder than the problem of e.g. negotiating with humans. I guess this is just another angle on “how many distinct moral questions do you have to answer?” since the real question is how much you can generalize from each answer. I don’t feel like there are that many hard-to-predict parameters before everything reduces to easy-to-predict consequences.
Re paragraph 2:
If the most efficient actions are ethically disastrous, then we have a fundamental problem which seems to me to be unrelated to AI safety, and which no AI control proposal will address. For example, if the most efficient strategy is to build a virus that kills everyone but you, and an AI is considering this strategy but has to reject it because it is unethical, then we are just out of luck. We could call this a problem with “AI,” but it’s really a problem with biotechnology.
If a certain kind of reinforcement learning is especially efficient but morally unacceptable, then that seems to be the same situation. What are we supposed to do, other than either accept the moral cost or adopt a good enough social solution to overcome the efficiency gap? What kind of solution might you hope to find that would make this kind of problem go away?
If the efficient actions merely might be ethically disastrous, then I guess the cost is supposed to be the time required to clarify the overseer’s values. Which brings us to:
The question is just how many distinct questions of this form there are, and how important they are to the AI’s plans. If there were merely a billion such questions it doesn’t seem like a big deal at all (though then a significant occupation of humans would be answering moral questions).
Even that strikes me as completely implausible given our experience so far (combined with my inability to see many future examples). If I were the user, and people were trying to optimize values using the range of policies available today, then it seems like they would have had to ask me no more than a dozen or so questions to get things basically right (i.e. realizing much more than 99% of the potential value from my perspective). So this seems to require moral problems to proliferate at a much faster rate than technological problems.
Do you disagree about the importance of hard ethical questions in the situation today (e.g. I am implicitly overlooking many important issues because I’m not used to dealing with an AI), or do you just expect more proliferation in the future?
Also, the problem of predicting human moral judgments doesn’t seem to be radically harder than the problem of e.g. negotiating with humans. I guess this is just another angle on “how many distinct moral questions do you have to answer?” since the real question is how much you can generalize from each answer. I don’t feel like there are that many hard-to-predict parameters before everything reduces to easy-to-predict consequences.