Since I believe that the human CEV is very close to correct, I believe that this would produce an AI that gives very good answers.
This presumes that the problem of specifying a CEV is well-posed. I haven’t seen any arguments around SI or LW about this very fundamental idea. I’m probably wrong and this has been addressed and will be happy to read more, but it would seem to me that it’s quite reasonable to assume that a tiny tiny error in specifying the CEV could lead to disastrously horrible results as perceived by the CEV itself.
This presumes that the problem of specifying a CEV is well-posed. I haven’t seen any arguments around SI or LW about this very fundamental idea. I’m probably wrong and this has been addressed and will be happy to read more, but it would seem to me that it’s quite reasonable to assume that a tiny tiny error in specifying the CEV could lead to disastrously horrible results as perceived by the CEV itself.