I understand that bad news makes one sad but does that lead to rejecting bad news?
For standard Bayesian agents, no. But these value updating agents behave differently. Imagine if a human said to the AI “If I say good, you action was good, and that will be your values. If I say bad, it will be the reverse.” Wouldn’t you want to motivate it to say “good”?
I have trouble seeing the difference as I think you can turn the variable value statements into empirical facts that map to a constant value. Say that cake->yummy->good, cake->icky->bad, death->icky->bad, death->yummy->good. Then the yummy->good connection could be questioned as a matter about the world and not about values. If a bayesian accepts sad news in that kind of world how come the value loader tries to shun them?
I might be committing mind-projection here, but no. Data is data, evidence is evidence. Expected moral data is, in some sense, moral data: if the AI predicts with high confidence that I will say “bad”, this ought to already be evidence that it ought not have done whatever I’m about to scold it for.
For standard Bayesian agents, no. But these value updating agents behave differently. Imagine if a human said to the AI “If I say good, you action was good, and that will be your values. If I say bad, it will be the reverse.” Wouldn’t you want to motivate it to say “good”?
I have trouble seeing the difference as I think you can turn the variable value statements into empirical facts that map to a constant value. Say that cake->yummy->good, cake->icky->bad, death->icky->bad, death->yummy->good. Then the yummy->good connection could be questioned as a matter about the world and not about values. If a bayesian accepts sad news in that kind of world how come the value loader tries to shun them?
This may clarify: http://lesswrong.com/r/discussion/lw/kdx/conservation_of_expected_moral_evidence_clarified/
I might be committing mind-projection here, but no. Data is data, evidence is evidence. Expected moral data is, in some sense, moral data: if the AI predicts with high confidence that I will say “bad”, this ought to already be evidence that it ought not have done whatever I’m about to scold it for.
This may clarify the points: http://lesswrong.com/r/discussion/lw/kdx/conservation_of_expected_moral_evidence_clarified/