adamShimi comments on Interpreting Yudkowsky on Deep vs Shallow Knowledge

adamShimi 8 Dec 2021 10:44 UTC
2 points
Without security mindset, one tends to think an unstoppable AI is a-priori likely to do what humans want, since humans built it. With security mindset, one sees that most AIs are nukes that wreak havoc on human values, and getting them to do what humans want is analogous to building crash-proof software for a space probe, except the whole human race only gets to launch one probe and it goes to whoever launches it first.
I think this is a really shallow argument that undersells enormously the actual reasons for caring about alignment. We have actual arguments for why unstoppable AI are not-likely to do what human wants, and they don’t need the security mindset at all. The basic outline is something like:
- Since we have historically a lot of trouble writing down programs that solve more complex and general problems like language or image recognition (and successes through ML), future AI and AGI will probably the sort to “fill-in the gaps” in our request/specifications
- For almost everything we could ask an AI to accomplish, there are actions that would help it and would be bad and counterintuitive from previous technology standpoint (the famous convergent subgoals)
- Precisely specifying what we want without relying on common sense is incredibly hard, and doesn’t survive strong optimization (Goodhart’s law)
- And competence by itself doesn’t solve the problem, because understanding what humans want doesn’t mean caring about it (Orthogonality thesis).
This line of reasoning (which is not new by any mean, it’s basically straight out of Bostrom and early Yudkowsky’s writing) justify the security mindset for AGI and alignment. Not the other way around.
(And historically, Yudkowsky wanted to build AGI before he found out about these points, which turned him into the biggest user — but not the only one by all mean — of the security mindset in alignment)
- Liron 8 Dec 2021 12:52 UTC
  5 points
  Parent
  Ok I agree there are a bunch of important concepts to be aware of, such as complexity of value, and there are many ways for security mindset by itself to fail at flagging the extent of AI risk if one is ignorant of some of these other concepts.
  
  I just think the outside view and extrapolating trends is so far from how one should reason about mere nukes, and superhuman intelligence is very nuke-like or at least has a very high chance of being nuke-like: that is, unlock unprecedentedly large rapid irreversible effects. Extrapolating from current trends would have been quite unhelpful to nuclear safety. I know Eliezer is just trying to meet other people in the discussion where they are, but it would be nice to have another discussion that seems more on-topic from Eliezer’s own perspective.