You do gesture at it with “maximum amount of harm”, but the specific framing I don’t quite see expressed here is this:
While a blackmailer may be revealing something “true”, the net effect (even if not “maximized” by the blackmailer) is often disproportionate to what one might desire. To give an example, a blackmailer may threaten to reveal that their target has a non-standard sexual orientation. In many parts of the world, the harm caused by this is considerably greater than the (utilitarian) “optimal” amount—in this case, zero. This is a function of not only the blackmailer’s attempt at optimizing their long-term strategy, but also of how people/society react to certain kinds of information. Unfortunately this is mostly an object-level argument (that society reacts inappropriately in predictable ways to some things), but it seems relevant.
This reminds me of the winner’s curse. When the blackmailer is optimizing for outrageousness, the outrage caused by their blackmail is predictably too much.
Winner’s Curse doesn’t seem like the right effect to me—it seems more like an orthogonality/Goodhart effect, where optimizing for outrageousness decreases the fitness w/r/t social welfare (on the margin). It’s always in the blackmailer’s interest to make the outrageousness greater, so they’re not (selfishly) sad when they overshoot.
My model: for each issue (example: rape, homosexuality, etc.) there is an ideal amount of outrage, from “none” to “burn them at the stake”. (“Ideal” meaning the amount that best achieves human goals, or something similar.) A given culture might approximate these amounts, but with error. Sometimes it will have more outrage than ideal, and sometimes it will have less.
A blackmailer is trying to maximize potential outrage. (They have limited resources and can only blackmail so many people. If Alice’s secret would make people avoid her if it got out, and Bob’s secret would make people murder him, then the blackmailer will blackmail Bob.) The blackmailer can form a relatively accurate model of what issues their culture is most outraged about, so they will maximize outrage rather well.
By analogy with the winner’s curse, if x is argmax(outrage(issue)), the culture has probably overestimated the necessary amount of outrage for x.
Yes. Long post is long and I didn’t want to throw out arguments about particular reveals to show this—in particular, we all think the cost of that should be zero in that case, and we all know it often very much isn’t. And I didn’t want anyone to think I was relying on that.
You do gesture at it with “maximum amount of harm”, but the specific framing I don’t quite see expressed here is this:
While a blackmailer may be revealing something “true”, the net effect (even if not “maximized” by the blackmailer) is often disproportionate to what one might desire. To give an example, a blackmailer may threaten to reveal that their target has a non-standard sexual orientation. In many parts of the world, the harm caused by this is considerably greater than the (utilitarian) “optimal” amount—in this case, zero. This is a function of not only the blackmailer’s attempt at optimizing their long-term strategy, but also of how people/society react to certain kinds of information. Unfortunately this is mostly an object-level argument (that society reacts inappropriately in predictable ways to some things), but it seems relevant.
This reminds me of the winner’s curse. When the blackmailer is optimizing for outrageousness, the outrage caused by their blackmail is predictably too much.
Winner’s Curse doesn’t seem like the right effect to me—it seems more like an orthogonality/Goodhart effect, where optimizing for outrageousness decreases the fitness w/r/t social welfare (on the margin). It’s always in the blackmailer’s interest to make the outrageousness greater, so they’re not (selfishly) sad when they overshoot.
My model: for each issue (example: rape, homosexuality, etc.) there is an ideal amount of outrage, from “none” to “burn them at the stake”. (“Ideal” meaning the amount that best achieves human goals, or something similar.) A given culture might approximate these amounts, but with error. Sometimes it will have more outrage than ideal, and sometimes it will have less.
A blackmailer is trying to maximize potential outrage. (They have limited resources and can only blackmail so many people. If Alice’s secret would make people avoid her if it got out, and Bob’s secret would make people murder him, then the blackmailer will blackmail Bob.) The blackmailer can form a relatively accurate model of what issues their culture is most outraged about, so they will maximize outrage rather well.
By analogy with the winner’s curse, if x is argmax(outrage(issue)), the culture has probably overestimated the necessary amount of outrage for x.
Yes. Long post is long and I didn’t want to throw out arguments about particular reveals to show this—in particular, we all think the cost of that should be zero in that case, and we all know it often very much isn’t. And I didn’t want anyone to think I was relying on that.