Thus hearing “the true morality is the hard task” has significantly reduced the AI’s expected utility. It would really have preferred not to have heard this—it would much prefer to have manipulated or murdered the speaker, or simply not talked with them.
I understand that bad news makes one sad but does that lead to rejecting bad news? Similarly pain is a good thing. Without it you would end up in all sorts of trouble. I would think that having a accurate knowledge of a things utility would be more important than knowings it’s expectancy. If you have a solid 0.5 utility or a 50⁄50 possibility of 1 and 0 you know in the uncertain case that if you behave as if the utility was 0.5 you are wrong by 0.5 in any case.
I understand that bad news makes one sad but does that lead to rejecting bad news?
For standard Bayesian agents, no. But these value updating agents behave differently. Imagine if a human said to the AI “If I say good, you action was good, and that will be your values. If I say bad, it will be the reverse.” Wouldn’t you want to motivate it to say “good”?
I have trouble seeing the difference as I think you can turn the variable value statements into empirical facts that map to a constant value. Say that cake->yummy->good, cake->icky->bad, death->icky->bad, death->yummy->good. Then the yummy->good connection could be questioned as a matter about the world and not about values. If a bayesian accepts sad news in that kind of world how come the value loader tries to shun them?
I might be committing mind-projection here, but no. Data is data, evidence is evidence. Expected moral data is, in some sense, moral data: if the AI predicts with high confidence that I will say “bad”, this ought to already be evidence that it ought not have done whatever I’m about to scold it for.
I have really trouble with this step
I understand that bad news makes one sad but does that lead to rejecting bad news? Similarly pain is a good thing. Without it you would end up in all sorts of trouble. I would think that having a accurate knowledge of a things utility would be more important than knowings it’s expectancy. If you have a solid 0.5 utility or a 50⁄50 possibility of 1 and 0 you know in the uncertain case that if you behave as if the utility was 0.5 you are wrong by 0.5 in any case.
For standard Bayesian agents, no. But these value updating agents behave differently. Imagine if a human said to the AI “If I say good, you action was good, and that will be your values. If I say bad, it will be the reverse.” Wouldn’t you want to motivate it to say “good”?
I have trouble seeing the difference as I think you can turn the variable value statements into empirical facts that map to a constant value. Say that cake->yummy->good, cake->icky->bad, death->icky->bad, death->yummy->good. Then the yummy->good connection could be questioned as a matter about the world and not about values. If a bayesian accepts sad news in that kind of world how come the value loader tries to shun them?
This may clarify: http://lesswrong.com/r/discussion/lw/kdx/conservation_of_expected_moral_evidence_clarified/
I might be committing mind-projection here, but no. Data is data, evidence is evidence. Expected moral data is, in some sense, moral data: if the AI predicts with high confidence that I will say “bad”, this ought to already be evidence that it ought not have done whatever I’m about to scold it for.
This may clarify the points: http://lesswrong.com/r/discussion/lw/kdx/conservation_of_expected_moral_evidence_clarified/