So if it’s possible to do everything exactly perfectly, to the level of a superintelligence calculating how it could most increase world utility and then performing only those actions—and still end up with guilt in a sufficiently hard-to-fix situation—why are you calling this quantity “guilt” at all? It certainly doesn’t fit my concept of what guilt is supposed to mean, and judging by the end of your post it doesn’t fit yours.
Why not call it “variable X”, and note that variable X has no particular correlation to any currently used English term or human emotion?
Also, the Shapley Value looks really interesting, but the wikipedia article you linked to sends me into Excessive Math Panic Mode. If you wanted to explain it in a more understandable/intuitive way, that would make a great topic for an LW post.
The Shapley value has been used on LW several times already: 1, 2. I understand it as follows: imagine a game with many players that can make “coalitions” with each other to win money from the universe, and two “coalitions” joined together can always win no less than they’d have won separately. Then the Shapley value is a way of distributing the maximum total winnings (where everyone cooperates) such that every player and every group of players get no less than they could’ve won for themselves by defecting (individually or as a group).
(I edited this away, but now Yvain replied to it, so I’m restoring it:) Should we reward a completely ineffectual action? Are you a deontologist?
No, but guilt is an inherently deontological concept.
Let me give an example. Actually, your example. Your Hitler voter model. Yeah, it successfully makes the person who voted for Hitler feel guilty. But it also makes the person who didn’t vote for Hitler, and maybe did everything e could to stop Hitler before being locked up in a German prison, equally guilty. So it actually makes the exact mistake you’re warning against—unless your single vote decides whether or not Hitler gets into power, people who votes for and against Hitler end up equally guilty (if your single vote decides it, then your present welfare is greater and you have less difference between present and perfect welfare).
Guilt is there to provide negative reinforcement for acting in an immoral way. So it’s only useful if there’s some more moral way you could act that it needs to reinforce you towards. Loading someone who’s literally done everything e could with a huge burden of guilt is like chronic pain disorder: if the pain’s not there to tell you to stop doing something painful, it’s just getting in the way.
And if your brain gives you equal squirts of guilt for voting for Hitler vs. fighting Hitler, guilt fails in its purpose as a motivation not to vote for Hitler, and any AI with a morality engine built around this theory of guilt will vote Hitler if there’s any reason to do so at all.
(as for Shapley, I see references to it but not a good explanation of how to derive it and why it works. Maybe that’s one of those things that actually can’t be explained simply and I ought to bite the bullet and try to parse the wiki article.)
I thought about it a while and your objections are correct. This thing seems to be measuring how much I could regret the current state of the world, not how much I should’ve done to change it. Added a “WRONG!” disclaimer to the post; hopefully people will still find it entertaining.
It might be helpful to also add your conclusion (i.e., exactly how you think it’s wrong) to the disclaimer. It seems an interesting fact, but I imagine many will miss it by not bothering to read a post marked as “wrong”.
The Shapley value averages over your marginal contribution to utilities of sub coalitions. The guy who votes against Hitler would be involved in some sub coalitions in which he is the marginal vote that defeats Hitler, and thus would have a positive Shapley value, where the guy who voted for Hitler would be involved in some sub coalitions where he is the marginal vote that elects Hitler, and thus would have a negative Shapley value.
I think Yvain is right and you’re wrong. The Shapley value takes as input the whole game, not a certain play of the game, so it doesn’t know that you actually voted for Hitler and the other guy didn’t.
The formula for the Shapley value (from the wiki article):
What this means is you take all sub coalition S of the total coalition N, excluding sub coalition that include yourself. Then average over the difference in value of the sub coalition S plus yourself and just the sub coalition S. (The first term in the sum makes it a weighted average depending on the sizes of S and N.) These sub coalitions S, and S plus yourself, did not actually happen, you are considering the counterfactual value of those being the actual coalitions.
The point is that the formula knows how your inclusion in a coalition changes its value.
So if it’s possible to do everything exactly perfectly, to the level of a superintelligence calculating how it could most increase world utility and then performing only those actions—and still end up with guilt in a sufficiently hard-to-fix situation—why are you calling this quantity “guilt” at all? It certainly doesn’t fit my concept of what guilt is supposed to mean, and judging by the end of your post it doesn’t fit yours.
Why not call it “variable X”, and note that variable X has no particular correlation to any currently used English term or human emotion?
Also, the Shapley Value looks really interesting, but the wikipedia article you linked to sends me into Excessive Math Panic Mode. If you wanted to explain it in a more understandable/intuitive way, that would make a great topic for an LW post.
The Shapley value has been used on LW several times already: 1, 2. I understand it as follows: imagine a game with many players that can make “coalitions” with each other to win money from the universe, and two “coalitions” joined together can always win no less than they’d have won separately. Then the Shapley value is a way of distributing the maximum total winnings (where everyone cooperates) such that every player and every group of players get no less than they could’ve won for themselves by defecting (individually or as a group).
(I edited this away, but now Yvain replied to it, so I’m restoring it:) Should we reward a completely ineffectual action? Are you a deontologist?
No, but guilt is an inherently deontological concept.
Let me give an example. Actually, your example. Your Hitler voter model. Yeah, it successfully makes the person who voted for Hitler feel guilty. But it also makes the person who didn’t vote for Hitler, and maybe did everything e could to stop Hitler before being locked up in a German prison, equally guilty. So it actually makes the exact mistake you’re warning against—unless your single vote decides whether or not Hitler gets into power, people who votes for and against Hitler end up equally guilty (if your single vote decides it, then your present welfare is greater and you have less difference between present and perfect welfare).
Guilt is there to provide negative reinforcement for acting in an immoral way. So it’s only useful if there’s some more moral way you could act that it needs to reinforce you towards. Loading someone who’s literally done everything e could with a huge burden of guilt is like chronic pain disorder: if the pain’s not there to tell you to stop doing something painful, it’s just getting in the way.
And if your brain gives you equal squirts of guilt for voting for Hitler vs. fighting Hitler, guilt fails in its purpose as a motivation not to vote for Hitler, and any AI with a morality engine built around this theory of guilt will vote Hitler if there’s any reason to do so at all.
(as for Shapley, I see references to it but not a good explanation of how to derive it and why it works. Maybe that’s one of those things that actually can’t be explained simply and I ought to bite the bullet and try to parse the wiki article.)
I thought about it a while and your objections are correct. This thing seems to be measuring how much I could regret the current state of the world, not how much I should’ve done to change it. Added a “WRONG!” disclaimer to the post; hopefully people will still find it entertaining.
It might be helpful to also add your conclusion (i.e., exactly how you think it’s wrong) to the disclaimer. It seems an interesting fact, but I imagine many will miss it by not bothering to read a post marked as “wrong”.
The Shapley value averages over your marginal contribution to utilities of sub coalitions. The guy who votes against Hitler would be involved in some sub coalitions in which he is the marginal vote that defeats Hitler, and thus would have a positive Shapley value, where the guy who voted for Hitler would be involved in some sub coalitions where he is the marginal vote that elects Hitler, and thus would have a negative Shapley value.
I think Yvain is right and you’re wrong. The Shapley value takes as input the whole game, not a certain play of the game, so it doesn’t know that you actually voted for Hitler and the other guy didn’t.
The formula for the Shapley value (from the wiki article):
What this means is you take all sub coalition S of the total coalition N, excluding sub coalition that include yourself. Then average over the difference in value of the sub coalition S plus yourself and just the sub coalition S. (The first term in the sum makes it a weighted average depending on the sizes of S and N.) These sub coalitions S, and S plus yourself, did not actually happen, you are considering the counterfactual value of those being the actual coalitions.
The point is that the formula knows how your inclusion in a coalition changes its value.