Matthew Barnett comments on Evaluating the historical value misspecification argument

Matthew Barnett 11 Oct 2023 6:28 UTC
3 points
1

Is my summary reasonably correct?

Yes, I think so, with one caveat:

I’m not saying anything about the fragility of value argument, since that seems like a separate argument than the argument that value is complex. I think the fragility of value argument is plausibly a statement about how easy it is to mess up if you get human values wrong, which still seems true depending on one’s point of view (e.g. if the AI exhibits all human values except it thinks murder is OK, then that could be catastrophic).

Overall, while I definitely could have been clearer when writing this post, the fact that you seemed to understand virtually all my points makes me feel better about this post than I originally felt.
- Linch 11 Oct 2023 8:29 UTC
  2 points
  0
  Parent
  Thanks! Though tbh I don’t think I fully got the core point via reading the post so I should only get partial credit; for me it took Alexander’s comment to make everything click together.