Matthew is not disputing this point, as far as I can tell.
Instead, he is trying to critique some version of[1] the “larger argument” (mentioned in the May 2024 update to this post) in which this point plays a role.
I’ll confirm that I’m not saying this post’s exact thesis is false. This post seems to be largely a parable about a fictional device, rather than an explicit argument with premises and clear conclusions. I’m not saying the parable is wrong. Parables are rarely “wrong” in a strict sense, and I am not disputing this parable’s conclusion.
However, I am saying: this parable presumably played some role in the “larger” argument that MIRI has made in the past. What role did it play? Well, I think a good guess is that it portrayed the difficulty of precisely specifying what you want or intend, for example when explicitly designing a utility function. This problem was often alleged to be difficult because, when you want something complex, it’s difficult to perfectly delineate potential “good” scenarios and distinguish them from all potential “bad” scenarios. This is the problem I was analyzing in my original comment.
While the term “outer alignment” was not invented to describe this exact problem until much later, I was using that term purely as descriptive terminology for the problem this post clearly describes, rather than claiming that Eliezer in 2007 was deliberately describing something that he called “outer alignment” at the time. Because my usage of “outer alignment” was merely descriptive in this sense, I reject the idea that my comment was anachronistic.
And again: I am not claiming that this post is inaccurate in isolation. In both my above comment, and in my 2023 post, I merely cited this post as portraying an aspect of the problem that I was talking about, rather than saying something like “this particular post’s conclusion is wrong”. I think the fact that the post doesn’t really have a clear thesis in the first place means that it can’t be wrong in a strong sense at all. However, the post was definitely interpreted as explaining some part of why alignment is hard — for a long time by many people — and I was critiquing the particular application of the post to this argument, rather than the post itself in isolation.
I’ll confirm that I’m not saying this post’s exact thesis is false. This post seems to be largely a parable about a fictional device, rather than an explicit argument with premises and clear conclusions. I’m not saying the parable is wrong. Parables are rarely “wrong” in a strict sense, and I am not disputing this parable’s conclusion.
However, I am saying: this parable presumably played some role in the “larger” argument that MIRI has made in the past. What role did it play? Well, I think a good guess is that it portrayed the difficulty of precisely specifying what you want or intend, for example when explicitly designing a utility function. This problem was often alleged to be difficult because, when you want something complex, it’s difficult to perfectly delineate potential “good” scenarios and distinguish them from all potential “bad” scenarios. This is the problem I was analyzing in my original comment.
While the term “outer alignment” was not invented to describe this exact problem until much later, I was using that term purely as descriptive terminology for the problem this post clearly describes, rather than claiming that Eliezer in 2007 was deliberately describing something that he called “outer alignment” at the time. Because my usage of “outer alignment” was merely descriptive in this sense, I reject the idea that my comment was anachronistic.
And again: I am not claiming that this post is inaccurate in isolation. In both my above comment, and in my 2023 post, I merely cited this post as portraying an aspect of the problem that I was talking about, rather than saying something like “this particular post’s conclusion is wrong”. I think the fact that the post doesn’t really have a clear thesis in the first place means that it can’t be wrong in a strong sense at all. However, the post was definitely interpreted as explaining some part of why alignment is hard — for a long time by many people — and I was critiquing the particular application of the post to this argument, rather than the post itself in isolation.