Suppose in 2024-2029, someone constructs an intelligent robot that is able clean a room to a high level of satisfaction, consistent with the user’s intentions, without any major negative side effects or general issues of misspecification. It doesn’t break any vases while cleaning.
I remember explicit discussion about how solving this problem shouldn’t even count as part of solving long-term / existential safety, for example:
“What I understand this as saying is that the approach is helpful for aligning housecleaning robots (using near extrapolations of current RL), but not obviously helpful for aligning superintelligence, and likely stops being helpful somewhere between the two. [...] There is a risk that a large body of safety literature which works for preventing today’s systems from breaking vases but which fails badly for very intelligent systems actually worsens the AI safety problem” https://www.lesswrong.com/posts/H7KB44oKoSjSCkpzL/worrying-about-the-vase-whitelisting?commentId=rK9K3JebKDofvJA3x
Why is it so hard to find people explicitly saying that this specific problem, and the examples illustrating it, were not meant to be seriously representative of the hard parts of alignment at the time?
See also The Main Sources of AI Risk? where this problem was only part of one bullet point, out of 30 (as of 2019, 35 now).
I remember explicit discussion about how solving this problem shouldn’t even count as part of solving long-term / existential safety, for example:
Two points:
I have a slightly different interpretation of the comment you linked to, which makes me think it provides only weak evidence for your claim. (Though it’s definitely still some evidence.)
I agree some people deserve credit for noticing that human-level value specification might be kind of easy before LLMs. I don’t mean to accuse everyone in the field of making the same mistake.
Anyway, let me explain the first point.
I interpret Abram to be saying that we should focus on solutions that scale to superintelligence, rather than solutions that only work on sub-superintelligent systems but break down at superintelligence. This was in response to Alex’s claim that “whitelisting contributes meaningfully to short- to mid-term AI safety, although I remain skeptical of its robustness to scale.”
In other words, Alex said (roughly): “This solution seems to work for sub-superintelligent AI, but might not work for superintelligent AI.” Abram said in response that we should push against such solutions, since we want solutions that scale all the way to superintelligence. This is not the same thing as saying that any solution to the house-cleaning robot provides negligible evidence of progress, because some solutions might scale.
It’s definitely arguable, but I think it’s likely that any realistic solution to the human-level house cleaning robot problem—in the strong sense of getting a robot to genuinely follow all relevant moral constraints, allow you to shut it down, and perform its job reliably in a wide variety of environments—will be a solution that scales reasonably well above human intelligence (maybe not all the way to radical superintelligence, but at the very least I don’t think it’s negligible evidence of progress).
If you merely disagree that any such solutions will scale, and you’ve been consistent on this point for the last five years, then I guess I’m not really addressing you in my original comment, but I still think what I wrote applies to many other researchers.
See also The Main Sources of AI Risk? where this problem was only part of one bullet point, out of 30 (as of 2019, 35 now).
“Inability to specify any ‘real-world’ goal for an artificial agent (suggested by Michael Cohen)”
I’m not sure how much to be compelled by this piece of evidence. I’ll point out that naming the same problem multiple times might have gotten repetitive, and there was also no explicit ranking of the problems from most important to least important (or from hardest to easiest). If the order you wrote them in can be (perhaps uncharitably) interpreted as the order of importance, then I’ll note that it was listed as problem #3, which I think supports my original thesis adequately.
I remember explicit discussion about how solving this problem shouldn’t even count as part of solving long-term / existential safety, for example:
“What I understand this as saying is that the approach is helpful for aligning housecleaning robots (using near extrapolations of current RL), but not obviously helpful for aligning superintelligence, and likely stops being helpful somewhere between the two. [...] There is a risk that a large body of safety literature which works for preventing today’s systems from breaking vases but which fails badly for very intelligent systems actually worsens the AI safety problem” https://www.lesswrong.com/posts/H7KB44oKoSjSCkpzL/worrying-about-the-vase-whitelisting?commentId=rK9K3JebKDofvJA3x
See also The Main Sources of AI Risk? where this problem was only part of one bullet point, out of 30 (as of 2019, 35 now).
Two points:
I have a slightly different interpretation of the comment you linked to, which makes me think it provides only weak evidence for your claim. (Though it’s definitely still some evidence.)
I agree some people deserve credit for noticing that human-level value specification might be kind of easy before LLMs. I don’t mean to accuse everyone in the field of making the same mistake.
Anyway, let me explain the first point.
I interpret Abram to be saying that we should focus on solutions that scale to superintelligence, rather than solutions that only work on sub-superintelligent systems but break down at superintelligence. This was in response to Alex’s claim that “whitelisting contributes meaningfully to short- to mid-term AI safety, although I remain skeptical of its robustness to scale.”
In other words, Alex said (roughly): “This solution seems to work for sub-superintelligent AI, but might not work for superintelligent AI.” Abram said in response that we should push against such solutions, since we want solutions that scale all the way to superintelligence. This is not the same thing as saying that any solution to the house-cleaning robot provides negligible evidence of progress, because some solutions might scale.
It’s definitely arguable, but I think it’s likely that any realistic solution to the human-level house cleaning robot problem—in the strong sense of getting a robot to genuinely follow all relevant moral constraints, allow you to shut it down, and perform its job reliably in a wide variety of environments—will be a solution that scales reasonably well above human intelligence (maybe not all the way to radical superintelligence, but at the very least I don’t think it’s negligible evidence of progress).
If you merely disagree that any such solutions will scale, and you’ve been consistent on this point for the last five years, then I guess I’m not really addressing you in my original comment, but I still think what I wrote applies to many other researchers.
I think it’s actually 2 points,
“Misspecified or incorrectly learned goals/values”
“Inability to specify any ‘real-world’ goal for an artificial agent (suggested by Michael Cohen)”
I’m not sure how much to be compelled by this piece of evidence. I’ll point out that naming the same problem multiple times might have gotten repetitive, and there was also no explicit ranking of the problems from most important to least important (or from hardest to easiest). If the order you wrote them in can be (perhaps uncharitably) interpreted as the order of importance, then I’ll note that it was listed as problem #3, which I think supports my original thesis adequately.