Matthew Barnett comments on Evaluating the historical value misspecification argument

Matthew Barnett 5 Oct 2023 21:56 UTC
LW: 7 AF: 2
2
AF
I don’t think the critical point of contention here is about whether par-human moral reasoning will help with alignment. It could, but I’m not making that argument. I’m primarily making the argument that specifying the human value function, or getting an AI to reflect back (and not merely passively understand) the human value function, seems easier than many past comments from MIRI people suggest. This problem is one aspect of the alignment problem, although by no means all of it, and I think it’s important to point out that we seem to be approaching an adequate solution.