TruePath comments on The shard theory of human values

TruePath 15 Oct 2022 7:04 UTC
1 point
−4
I’d just like to add that even if you think this piece is completly mistaken I think it certainly shows we are definitely not knowledgeable enough about what and how values and motives work in us much less AI to confidently make the prediction that AIs will be usefully described with a single global utility function or will work to subvert their reward system or the like.

Maybe that will turn out to be true but before we spend so many resources on trying to solve AI alignment let’s try to make the argument for the great danger much more rigorous first...usually best way to start anyway.