Hmmm...the orthogonality thesis is pretty simple to state, so I don’t think necessarily that it has been grossly misunderstood. The bad reasoning in Fallacy 4 seems to come from a more general phenomenon with classic AI Safety arguments, where they do hold up, but only with some caveats and/or more precise phrasing. So I guess “bad coverage” could apply to the extent that popular sources don’t go in depth enough.
I do think the author presented good summaries of Bostrom’s and Russell’s viewpoints. But then they immediately jump to a “special sauce” type argument. (Quoting the full thing just in case)
The thought experiments proposed by Bostrom and Russell seem to assume that an AI system could be“superintelligent” without any basic humanlike common sense, yet while seamlessly preserving the speed, precision and programmability of a computer. But these speculations about superhuman AI are plagued by flawed intuitions about the nature of intelligence. Nothing in our knowledge of psychology or neuroscience supports the possibility that “pure rationality” is separable from the emotions and cultural biases that shape our cognition and our objectives. Instead, what we’ve learned from research in embodied cognition is that human intelligence seems to be a strongly integrated system with closely interconnected attributes, include emotions, desires, a strong sense of self hood and autonomy, and a commonsense understanding of the world. It’s not at all clear that these attributes can be separated.
I really don’t understand where the author is coming from with this. I will admit that the classic paperclip maximizer example is pretty far-fetched, and maybe not the best way to explain the orthogonality thesis to a skeptic. I prefer more down-to-earth examples like, say, a chess bot with plenty of compute to look ahead, but its goal is to protect its pawns at all costs instead of its king. It will pursue its goal intelligently but the goal is silly to us, if what we want is for it to be a good chess player.
I feel like the author’s counterargument would make more sense if they framed it as an outer alignment objection like “it’s exceedingly difficult to make an AI whose goal is to maximize paperclips unboundedly, with no other human values baked in, because the training data is made by humans”. And maybe this is also what their intuition was, and they just picked on the orthogonality thesis since it’s connected to the paperclip maximize example and easy to state. Hard to tell.
It would be nice if AI Safety were less disorganized, and had a textbook or something. Then, a researcher would have a hard time learning about the orthogonality thesis without also hearing a refutation of this common objection. But a textbook seems a long way away...
Good points! Yes this snippet is particularly nonsensical to me
an AI system could be“superintelligent” without any basic humanlike common sense, yet while seamlessly preserving the speed, precision and programmability of a computer
It sounds like their experience with computers has involved them having a lot of “basic humanlike common sense” which is a pretty crazy experience in this case. When I explain what programming is like to kids, I usually say something like “The computer will do exactly exactly exactly what you tell it to, extremely fast. You can’t rely on any basic sense checking or common sense, or understanding from it, if you can’t define what you want specifically enough, the computer will fail in a (to you) very stupid way, very quickly.”
Hmmm...the orthogonality thesis is pretty simple to state, so I don’t think necessarily that it has been grossly misunderstood. The bad reasoning in Fallacy 4 seems to come from a more general phenomenon with classic AI Safety arguments, where they do hold up, but only with some caveats and/or more precise phrasing. So I guess “bad coverage” could apply to the extent that popular sources don’t go in depth enough.
I do think the author presented good summaries of Bostrom’s and Russell’s viewpoints. But then they immediately jump to a “special sauce” type argument. (Quoting the full thing just in case)
I really don’t understand where the author is coming from with this. I will admit that the classic paperclip maximizer example is pretty far-fetched, and maybe not the best way to explain the orthogonality thesis to a skeptic. I prefer more down-to-earth examples like, say, a chess bot with plenty of compute to look ahead, but its goal is to protect its pawns at all costs instead of its king. It will pursue its goal intelligently but the goal is silly to us, if what we want is for it to be a good chess player.
I feel like the author’s counterargument would make more sense if they framed it as an outer alignment objection like “it’s exceedingly difficult to make an AI whose goal is to maximize paperclips unboundedly, with no other human values baked in, because the training data is made by humans”. And maybe this is also what their intuition was, and they just picked on the orthogonality thesis since it’s connected to the paperclip maximize example and easy to state. Hard to tell.
It would be nice if AI Safety were less disorganized, and had a textbook or something. Then, a researcher would have a hard time learning about the orthogonality thesis without also hearing a refutation of this common objection. But a textbook seems a long way away...
Good points!
Yes this snippet is particularly nonsensical to me
It sounds like their experience with computers has involved them having a lot of “basic humanlike common sense” which is a pretty crazy experience in this case. When I explain what programming is like to kids, I usually say something like “The computer will do exactly exactly exactly what you tell it to, extremely fast. You can’t rely on any basic sense checking or common sense, or understanding from it, if you can’t define what you want specifically enough, the computer will fail in a (to you) very stupid way, very quickly.”