Stuart_Armstrong comments on Avoiding xrisk from AI doesn’t mean focusing on AI xrisk

Stuart_Armstrong 3 May 2023 7:31 UTC
8 points
4
Having done a lot of work on corrigibility, I believe that it can’t be implemented in a value agnostic way; it needs a subset of human values to make sense. I also believe that it requires a lot of human values, which is almost equivalent to solving all of alignment; but this second belief is much less firm, and less widely shared.