ryan_greenblatt comments on Deconstructing Bostrom’s Classic Argument for AI Doom

ryan_greenblatt 11 Mar 2024 7:44 UTC
7 points
0
As far as the orthogonality thesis, relevant context is:
- The arbital page which defines it more precisely: “The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.”
- Yudkowsky’s tweet explaining what he meant. (And a variety of responses by many parties.)
- A post by @tailcalled
My overall take is that in Nora’s/Bush’s decomposition, the orthogonality thesis corresponds to “trivial”.

However, I would prefer to instead call it “extremely-obvious-from-my-perspective” as indeed some people seem to disagree with this. (Yes, it’s very obvious that ASI pursing arbitrary goals is logically possible! The thesis is intended to be obvious! The strong version as defined by Yudkowsky (there need be nothing especially complicated or twisted about an agent pursuing an arbitrary goal (if that goal isn’t massively complex)) is also pretty obvious IMO.)

I agree that people seem to quote the orthogonality thesis as making stronger claims than it actually directly claims (e.g. misalignment is likely which is not at all implied by the thesis). And that awkwardly people seem to redefine the term in various ways (as noted in Yudkowsky’s tweet linked above). So this creates a Motte and Bailey in practice, but this doesn’t mean the thesis is wrong. (Edit: Also, I don’t recall cases where Yudkowsky or Bostrom did this Motte and Bailey without further argument, but I wouldn’t be very surprised to see it, particular for Bostrom.)
- Steven Byrnes 11 Mar 2024 18:50 UTC
  6 points
  5
  Parent
  Agree—I was also arguing for “trivial” in this EA Forum thread a couple years ago.
- TAG 12 Mar 2024 15:02 UTC
  4 points
  2
  Parent
  
  The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.”
  
  The Ortogonality Thesis is often used in a way that “smuggles in” the idea that an AI will necessarily have a stable goal, even though goals can be very variewd. But similar reasoning shows that any combination of goal (in)stability and goallessness is possible, as well. mindspace contains agents with fixed goals, randomnly drifting goals, corrigble (externally controlable goals) , as well as non-agentive minds with no goals.
- Nora Belrose 11 Mar 2024 11:15 UTC
  2 points
  3
  Parent
  
  The strong version as defined by Yudkowsky… is pretty obvious IMO
  
  I didn’t expect you’d say that. In my view it’s pretty obviously false. Knowledge and skills are not value-neutral, and some goals are a lot harder to instill into an AI than others bc the relevant training data will be harder to come by. Eliezer is just not taking into account data availability whatsoever, because he’s still fundamentally thinking about things in terms of GOFAI and brains in boxes in basements rather than deep learning. As Robin Hanson pointed out in the foom debate years ago, the key component of intelligence is “content.” And content is far from value neutral.
  - ryan_greenblatt 11 Mar 2024 15:22 UTC
    5 points
    0
    Parent
    Hmm, maybe I’m interpreting the statement to mean something weaker and more handwavy than you are. I agree with claims like “with current technology, it can be hard to make an AI pursue some goals as competently as other goals” and “if a goal is hard to specify given available training data, then it’s harder to make an AI pursue it”.
    
    However, I think how competently an AI pursues a goal is somewhat different than whether an AI tries to pursues a goal at all.(Which is what I think the strong version of the thesis is still getting at.) I was trying to get at the “hard to specify” thing with the simplicity caveat. There are also many other caveats because goals and other concepts are quite handwavy.
    
    Doesn’t seem important to discuss further.
    
    I think I agree with everything you said. (Except for the psychologising about Eliezer on which I have no particular opinion.)
  - DaemonicSigil 12 Mar 2024 5:31 UTC
    2 points
    0
    Parent
    Could you give an example of knowledge and skills not being value neutral?
    
    (No need to do so if you’re just talking about the value of information depending on the values one has, which is unsurprising. But it sounds like you might be making a more substantial point?)
- Nora Belrose 11 Mar 2024 7:48 UTC
  2 points
  −8
  Parent
  As I argue in the video, I actually think the definitions of “intelligence” and “goal” that you need to make the Orthogonality Thesis trivially true are bad, unhelpful definitions. So I both think that it’s false, and even if it were true it’d be trivial.
  I’ll also note that Nick Bostrom himself seems to be making the motte and bailey argument here, which seems pretty damning considering his book was very influential and changed a lot of people’s career paths, including my own.
  Edit replying to an edit you made: I mean, the most straightforward reading of Chapters 7 and 8 of Superintelligence is just a possibility-therefore-probability fallacy in my opinion. Without this fallacy, there would be little need to even bring up the orthogonality thesis at all, because it’s such a weak claim.
  - ryan_greenblatt 11 Mar 2024 8:04 UTC
    5 points
    2
    Parent
    I mean, the most straightforward reading of Chapters 7 and 8 of Superintelligence is just a possibility-therefore-probability fallacy in my opinion.
    
    The most relevant quote from Superintelligence (that I could find) is:
    
    Second, the orthogonality thesis suggests that we cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans—scientific curiosity, benevolent concern for others, spiritual enlightenment and contemplation, renunciation of material acquisitiveness, a taste for refined culture or for the simple pleasures in life, humility and selflessness, and so forth. We will consider later whether it might be possible through deliberate effort to construct a superintelligence that values such things, or to build one that values human welfare, moral goodness, or any other complex purpose its designers might want it to serve. But it is no less possible— and in fact technically a lot easier—to build a superintelligence that places final value on nothing but calculating the decimal expansion of pi. This suggests that—absent a special effort—the first superintelligence may have some such random or reductionistic final goal.
    
    My interpretation is that Bostrom is trying to be reasonably precise here and trying to do something like:
    
    You might have “blithely assumed” that things would necessarily be fine, but orthogonality. (Again, extremely obvious.)
    Also, it (separately) seems to me (Bostrom) to be technically easier to get your AI to have a simple goal, which implies that random goals might be more likely.
    
    I think you disagree with point (2) here (and I disagree with point 2 as well), but this seems different from the claim you made. (I didn’t bother looking for Bostrom’s arguments for (2), but I expect them to be weak and easily defeated, at least ex-post.)
    
    TBC, I can see where you’re coming from, but I think Bostrom tries to avoid this fallacy. It would be considerably better if he explicitly called out this fallacy and disclaimed it. So, I think he should be partially blamed for likely misinterpretations.