Like, if you see a friend taking an action that you know and they know has a 50% chance of making $10 for you and your friend (let’s say for a communal club) and a 50% chance of losing $5, and then turns out they lose $5, then it seems better to still reward your friend for taking that action, instead of punishing them, given that you know the action had positive expected value.
(assuming you have mostly linear value of money at these stakes)
If they think the odds are 90% $10 and 10% −5$ and you think the odds are 10% $10 and 90% −5$ should you reward for trying to benefit or punish for having wrong beliefs that materially matter?
No, because humans are risk-averse, at least in money terms, but also in most other currencies. If you do this, you increase the total risk for your friend, for no particular gain.
Punishment is also usually net-negative, whereas rewards tend to be zero-sum, so by adding a bunch of worlds where you added punishments, you destroyed a bunch of value, with no gain (in the world where you both have certainty about the payoff matrix).
One model here is that humans have diminishing returns on money, so in order to reward someone 2x with dollars, you have to pay more than 2x the dollar amount, so your total cost is higher.
A scenario with zero-sum actions and net-negative actions can only go downhill. This would seem to imply that if you have an opportunity to give feedback or not give feedback you should opt to get a guaranteed zero rather than risk destroying value.
Rewards are usually a transfer of resources (e.g. me giving you money), which tend to preserve total wealth (or status, or whatever other resource you are thinking about).
Unilateral punishments are usually not transfers of resource, they are usually one party imposing a cost on another party (like hitting them with a stick and injuring them), in a way that does not preserve total wealth (or health, or whatever other resource applies to the situation).
You certainly shouldn’t hit your friend with a stick if he loses $5 of your club’s money. I think this is fairly obvious, and it seems quite improbable that you were assuming that I was suggesting any such thing. So, given that we can’t possibly be talking about injuring anyone, or doing any such thing, how can your point about net-negative punishment apply? The more sensible assumption is that the punishment is of the same kind as the reward.
I think social punishments usually have the same form. Where rewards tend to be more of a transfer of status, and punishments more of a destruction of status (two people can destroy each others reputation with repeated social punishments).
There is also the bandwidth cost of punishment, as well as the simple fact that giving people praise usually comes with a positive emotional component for the receiver (in addition to the status and the reputation), whereas punishments usually come with an addition of stress and discomfort that reduces total output for a while.
In either case, I think the simpler case is made by simply looking at the assumption of diminishing returns in resources and realizing that the cost of giving someone a reward they care 2x about is usually larger than the cost of giving the reward twice, meaning that there is an inherent cost to high-variance reward landscapes.
Like, if you see a friend taking an action that you know and they know has a 50% chance of making $10 for you and your friend (let’s say for a communal club) and a 50% chance of losing $5, and then turns out they lose $5, then it seems better to still reward your friend for taking that action, instead of punishing them, given that you know the action had positive expected value.
(assuming you have mostly linear value of money at these stakes)
If they think the odds are 90% $10 and 10% −5$ and you think the odds are 10% $10 and 90% −5$ should you reward for trying to benefit or punish for having wrong beliefs that materially matter?
You should punish your friend for the loss, and reward them (twice as much) for a win. This creates the correct incentives.
No, because humans are risk-averse, at least in money terms, but also in most other currencies. If you do this, you increase the total risk for your friend, for no particular gain.
Punishment is also usually net-negative, whereas rewards tend to be zero-sum, so by adding a bunch of worlds where you added punishments, you destroyed a bunch of value, with no gain (in the world where you both have certainty about the payoff matrix).
One model here is that humans have diminishing returns on money, so in order to reward someone
2x
with dollars, you have to pay more than 2x the dollar amount, so your total cost is higher.A scenario with zero-sum actions and net-negative actions can only go downhill. This would seem to imply that if you have an opportunity to give feedback or not give feedback you should opt to get a guaranteed zero rather than risk destroying value.
Could you elaborate on this? I’m not at all sure what this is referring to.
Rewards are usually a transfer of resources (e.g. me giving you money), which tend to preserve total wealth (or status, or whatever other resource you are thinking about).
Unilateral punishments are usually not transfers of resource, they are usually one party imposing a cost on another party (like hitting them with a stick and injuring them), in a way that does not preserve total wealth (or health, or whatever other resource applies to the situation).
You certainly shouldn’t hit your friend with a stick if he loses $5 of your club’s money. I think this is fairly obvious, and it seems quite improbable that you were assuming that I was suggesting any such thing. So, given that we can’t possibly be talking about injuring anyone, or doing any such thing, how can your point about net-negative punishment apply? The more sensible assumption is that the punishment is of the same kind as the reward.
I think social punishments usually have the same form. Where rewards tend to be more of a transfer of status, and punishments more of a destruction of status (two people can destroy each others reputation with repeated social punishments).
There is also the bandwidth cost of punishment, as well as the simple fact that giving people praise usually comes with a positive emotional component for the receiver (in addition to the status and the reputation), whereas punishments usually come with an addition of stress and discomfort that reduces total output for a while.
In either case, I think the simpler case is made by simply looking at the assumption of diminishing returns in resources and realizing that the cost of giving someone a reward they care 2x about is usually larger than the cost of giving the reward twice, meaning that there is an inherent cost to high-variance reward landscapes.