FWIW I strongly agree with what I thought was the original form of the orthogonality thesis—i.e. that a highly intelligent agent could have arbitrary goals—but am was confused by Eliezer’s tweet linked by Nora Belrose’s tweetpost.
Eliezer:
if a superintelligence can calculate which actions would produce a galaxy full of paperclips, you can just as easily have a superintelligence which does that thing and makes a galaxy full of paperclips
This seems to me to be trivially correct if what is meant by “calculate which actions” means calculating down to every last detail.
On the other hand, if the claim were something like that we can’t know anything what an AI would do without actually running it (which I do not think is the claim, just stating an extreme interpretation), then it seems obviously false.
And if it’s something in between, or something else entirely, then what specifically is the claim?
It’s also not clear to me why that trivial-seeming claim is supposed to be related to (or even a stronger form than) other varieties of the “weak” form of the orthogonality thesis.Actually that’s obvious, I should have bothered thinking about it, like, at all. So, yeah, I can see why this formulation is useful—it’s both obviously correct and does imply that you can have something make a galaxy full of paperclips.
Explain what you mean? What are you calling the orthogonality thesis (note that part of my point is that different alleged versions of it don’t seem to be related), what are you calling “E. Yudkowsky’s argument”, and how does the orthogonality thesis shoot it down?
Ah I see, you are talking about the “meaning of life” document. Yes, the version of the orthogonality thesis that says that goals are not specified by intelligence (notand also the version presented in Eliezer’s tweet) does shoot down that document. (My comment was about that tweet, so was largely unrelated to that document).
FWIW I strongly agree with what I thought was the original form of the orthogonality thesis—i.e. that a highly intelligent agent could have arbitrary goals—but
amwas confused by Eliezer’s tweet linked by Nora Belrose’s tweetpost.Eliezer:
This seems to me to be trivially correct if what is meant by “calculate which actions” means calculating down to every last detail.
On the other hand, if the claim were something like that we can’t know anything what an AI would do without actually running it (which I do not think is the claim, just stating an extreme interpretation), then it seems obviously false.
And if it’s something in between, or something else entirely, then what specifically is the claim?
It’s also not clear to me why that trivial-seeming claim is supposed to be related to (or even a stronger form than) other varieties of the “weak” form of the orthogonality thesis.Actually that’s obvious, I should have bothered thinking about it, like, at all. So, yeah, I can see why this formulation is useful—it’s both obviously correct and does imply that you can have something make a galaxy full of paperclips.I think the orthogonality thesis holds strongly enough to shoot down E. Yudkowsky’s argument?
Explain what you mean? What are you calling the orthogonality thesis (note that part of my point is that different alleged versions of it don’t seem to be related), what are you calling “E. Yudkowsky’s argument”, and how does the orthogonality thesis shoot it down?Ah I see, you are talking about the “meaning of life” document. Yes, the version of the orthogonality thesis that says that goals are not specified by intelligence (
notand also the version presented in Eliezer’s tweet) does shoot down that document. (My comment was about that tweet, so was largely unrelated to that document).