gwern comments on Motivation control

gwern 31 Oct 2024 20:26 UTC
7 points
0
Considering that one of the primary barriers to civilian nuclear power plants was, and remains, nuclear bomb proliferation risk, I’m not sure how telling this analogy is. There’s a big reason that nuclear power plants right now are associated with either avowed nuclear powers or powers that want to be nuclear powers at some point (eg. France, South Korea, North Korea, Japan, Iran, Pakistan...) or countries closely aligned with said nuclear powers. Or rather, it seems to me that the analogy goes the opposite of how you wanted: if someone had ‘solved’ nuclear reactor design by coming up with a type of nuclear reactor which was provably impossible to abuse for nukes, that would have been a lot more useful for nuclear reactor design than fiddling with details about how exactly to boil water to drive a turbine. If you solve the latter, you have not solved the former at all; if you solve the former, someone will solve the latter. And if you don’t, ‘nuclear power plant’ suddenly becomes a design problem which includes things like ‘resistant to Israeli jet strikes’ or ‘enables manipulation of international inspectors’ or ‘relies on a trustworthy closely-allied superpower rather than untrustworthy one for support like spent fuel reprocessing and refueling’.
- Charlie Steiner 31 Oct 2024 22:23 UTC
  5 points
  0
  Parent
  I don’t think Joe is proposing we find an AI design that is impossible to abuse even by malicious humans. The point so far seems to be making sure your own AI is not going to do some specific bad stuff.
  
  If you solve the latter, you have not solved the former at all; if you solve the former, someone will solve the latter.
  
  Insofar as this is true in your extended analogy, I think that’s a reflection of “completely proliferation-proof reactor” being a bad thing to just assume you can solve.