I might be committing mind-projection here, but no. Data is data, evidence is evidence. Expected moral data is, in some sense, moral data: if the AI predicts with high confidence that I will say “bad”, this ought to already be evidence that it ought not have done whatever I’m about to scold it for.
I might be committing mind-projection here, but no. Data is data, evidence is evidence. Expected moral data is, in some sense, moral data: if the AI predicts with high confidence that I will say “bad”, this ought to already be evidence that it ought not have done whatever I’m about to scold it for.
This may clarify the points: http://lesswrong.com/r/discussion/lw/kdx/conservation_of_expected_moral_evidence_clarified/