Noosphere89 comments on My decomposition of the alignment problem

Noosphere89 10 Sep 2024 18:05 UTC
4 points
0
The Harms version of corrigibility is pretty similar in that it should take instructions first and foremost, even though it’s got a more elaborate model of the user’s preferences to help in interpreting instructions correctly, and it’s supposed to act on its own initiative in some cases. But the two approaches may converge almost completely after a user has given a wise set of standing instructions to their DWIMAC AGI.
Note that the link to the Harms version of corrigibility doesn’t work.
- Seth Herd 10 Sep 2024 18:10 UTC
  2 points
  0
  Parent
  Thank you! Fixed.