Matthew Barnett comments on Goodhart’s Curse and Limitations on AI Alignment

Matthew Barnett 19 Aug 2019 20:52 UTC
1 point
a very slight misalignment would be disastrous. That seems possible, per Eliezer’s Rocket Example, but is far from certain.
Just a minor nitpick, I don’t think the point of the Rocket Alignment Metaphor was supposed to be that slight misalignment was catastrophic. I think the more apt interpretation is that apparent alignment does not equal actual alignment, and you need to do a lot of work before you get to the point where you can talk meaningfully about aligning an AI at all. Relevant quote from the essay,
It’s not that current rocket ideas are almost right, and we just need to solve one or two more problems to make them work. The conceptual distance that separates anyone from solving the rocket alignment problem is much greater than that.
Right now everyone is confused about rocket trajectories, and we’re trying to become less confused. That’s what we need to do next, not run out and advise rocket engineers to build their rockets the way that our current math papers are talking about. Not until we stop being confused about extremely basic questions like why the Earth doesn’t fall into the Sun.
- Davidmanheim 20 Aug 2019 11:18 UTC
  1 point
  Parent
  Fully agree—I was using the example to make a far less fundamental point.