Matthew Barnett comments on How do you feel about LessWrong these days? [Open feedback thread]

Matthew Barnett 11 Dec 2023 7:55 UTC
2 points
0
I remember explicit discussion about how solving this problem shouldn’t even count as part of solving long-term / existential safety, for example:
Two points:
- I have a slightly different interpretation of the comment you linked to, which makes me think it provides only weak evidence for your claim. (Though it’s definitely still some evidence.)
- I agree some people deserve credit for noticing that human-level value specification might be kind of easy before LLMs. I don’t mean to accuse everyone in the field of making the same mistake.
Anyway, let me explain the first point.
I interpret Abram to be saying that we should focus on solutions that scale to superintelligence, rather than solutions that only work on sub-superintelligent systems but break down at superintelligence. This was in response to Alex’s claim that “whitelisting contributes meaningfully to short- to mid-term AI safety, although I remain skeptical of its robustness to scale.”
In other words, Alex said (roughly): “This solution seems to work for sub-superintelligent AI, but might not work for superintelligent AI.” Abram said in response that we should push against such solutions, since we want solutions that scale all the way to superintelligence. This is not the same thing as saying that any solution to the house-cleaning robot provides negligible evidence of progress, because some solutions might scale.
It’s definitely arguable, but I think it’s likely that any realistic solution to the human-level house cleaning robot problem—in the strong sense of getting a robot to genuinely follow all relevant moral constraints, allow you to shut it down, and perform its job reliably in a wide variety of environments—will be a solution that scales reasonably well above human intelligence (maybe not all the way to radical superintelligence, but at the very least I don’t think it’s negligible evidence of progress).
If you merely disagree that any such solutions will scale, and you’ve been consistent on this point for the last five years, then I guess I’m not really addressing you in my original comment, but I still think what I wrote applies to many other researchers.
See also The Main Sources of AI Risk? where this problem was only part of one bullet point, out of 30 (as of 2019, 35 now).
I think it’s actually 2 points,
1. “Misspecified or incorrectly learned goals/values”
2. “Inability to specify any ‘real-world’ goal for an artificial agent (suggested by Michael Cohen)”
I’m not sure how much to be compelled by this piece of evidence. I’ll point out that naming the same problem multiple times might have gotten repetitive, and there was also no explicit ranking of the problems from most important to least important (or from hardest to easiest). If the order you wrote them in can be (perhaps uncharitably) interpreted as the order of importance, then I’ll note that it was listed as problem #3, which I think supports my original thesis adequately.