Rob Bensinger comments on Evaluating the historical value misspecification argument

Rob Bensinger 5 Oct 2023 20:53 UTC
LW: 10 AF: 8
1
AF
That makes sense, but I say in the post that I think we will likely have a solution to the value identification problem that’s “about as good as human judgement” in the near future.
We already have humans who are smart enough to do par-human moral reasoning. For “AI can do par-human moral reasoning” to help solve the alignment problem, there needs to be some additional benefit to having AI systems that can match a human (e.g., some benefit to our being able to produce enormous numbers of novel moral judgments without relying on an existing text corpus or hiring thousands of humans to produce them). Do you have some benefit in mind?
- Matthew Barnett 5 Oct 2023 21:56 UTC
  LW: 7 AF: 2
  2
  AF Parent
  I don’t think the critical point of contention here is about whether par-human moral reasoning will help with alignment. It could, but I’m not making that argument. I’m primarily making the argument that specifying the human value function, or getting an AI to reflect back (and not merely passively understand) the human value function, seems easier than many past comments from MIRI people suggest. This problem is one aspect of the alignment problem, although by no means all of it, and I think it’s important to point out that we seem to be approaching an adequate solution.