I briefly looked for and did not find a good citation for this.
Who are you arguing against, or whose argument are you trying to clarify?
I’m not sure. However, I have a lot of conversations where it seems to me that the other person believes the Misspecified Goal Argument. Currently, if I were to meet a MIRI employee I hadn’t met before, I would be unsure whether the Misspecified Goal Argument is their primary reason for worrying about AI risk. If I meet a rationalist who takes the MIRI perspective on AI risk but isn’t at MIRI themselves, by default I assume that their primary reason for caring about AI risk is the Misspecified Goal argument.
I do want to note that I am primarily trying to clarify here, I didn’t write this as an argument against the Misspecified Goal argument. In fact, conditional on the AI having goals, I do agree with the Misspecified Goal argument.
I tend to have a different version of the Misspecified Goal argument in mind which I think doesn’t have this problem
Yeah, I think this is a good argument, and I want to defer to my future post on the topic, which should come out on Wednesday. The TL;DR is that I agree with the argument but it implies a broader space of potential solutions than “figure out how to align a goal-directed AI”.
(Sorry that I didn’t adequately point to different arguments and what I think about them—I didn’t do this because it would make for a very long post, and it’s instead being split into several posts, and this particular argument happens to be in the post on Wednesday.)
I briefly looked for and did not find a good citation for this.
I’m not sure. However, I have a lot of conversations where it seems to me that the other person believes the Misspecified Goal Argument. Currently, if I were to meet a MIRI employee I hadn’t met before, I would be unsure whether the Misspecified Goal Argument is their primary reason for worrying about AI risk. If I meet a rationalist who takes the MIRI perspective on AI risk but isn’t at MIRI themselves, by default I assume that their primary reason for caring about AI risk is the Misspecified Goal argument.
I do want to note that I am primarily trying to clarify here, I didn’t write this as an argument against the Misspecified Goal argument. In fact, conditional on the AI having goals, I do agree with the Misspecified Goal argument.
Yeah, I think this is a good argument, and I want to defer to my future post on the topic, which should come out on Wednesday. The TL;DR is that I agree with the argument but it implies a broader space of potential solutions than “figure out how to align a goal-directed AI”.
(Sorry that I didn’t adequately point to different arguments and what I think about them—I didn’t do this because it would make for a very long post, and it’s instead being split into several posts, and this particular argument happens to be in the post on Wednesday.)