One big aspect of Yudkowskian decision theory is how to respond to threats. Following causal decision theory means you can neither make credible threats nor commit to deterrence to counter threats. Yudkowsky endorses not responding to threats to avoid incentivising them, while also having deterrence commitments to maintain good equilibria. He also implies this is a consequence of using a sensible functional decision theory. But there’s a tension here: your deterrence commitment could be interpreted as a threat by someone else, or visa versa.
I have also noted this tension. Intuitively, one might think that it depends on the morality of the action—the robber who threatens to blow up a bank unless he gets his money might be seen as a threat, while a policy of blowing up your own banks in case of robberies might be seen as a deterrence commitment.
However, this can not be it, because decision theory works with any utility functions.
The other idea is that that ‘to make threats’ is one of these irregular verbs. I make credible deterrence commitments, you show a willingness to escalate, they try to blackmail me with irrational threats. This is of course just as silly in the context of game theory.
One axis of difference might be if you physically restrict your own options to prevent you from not following through on your threat (like a Chicken player removing their steering wheel, or that doomsday machine from Dr. Strangelove). But this merely makes a difference for people who are known to follow causal decision theory where they would try to maximize the utility in whatever branch of reality they find themselves in. From my understanding, the adherents of functional decision theory do not need to physically constrain their options—they would be happy to burn the world in one branch of reality if that was the dominant strategy before their opponent had made their choice.
Consider the ultimatum game (which gets convered in Planecrash, naturally) where one party makes a proposal on how to split 10$ and the other party can either accept (gaining their share) or reject it (in which case neither party gains anything). In planecrash, the dominant strategy is presented as rejecting unfair allocations with some probability so that the expected value of the proposing party is lower than if they had proposed a fair split. However, this hinges on the concept of fairness. If each dollar has the same utility to every participant, then a 50-50 split seems fair. But in a more general case, the utilities of both parties might be utterly incomparable, or the effort of both players might be very different—an isomorphic situation is a silk merchant encountering the first of possibly multiple highwaymen, and having to agree on a split, with both parties having the option to burn all the silk if they don’t agree. Agreeing to a 50-50 split each time could easily make the business model of the silk merchant impossible.
“This is my strategy, and I will not change it no matter what, so you better adapt your strategy if you want to avoid fatal outcomes” is an attitude likely to lead to a lot of fatal outcomes.
… the effort of both players might be very different …
Covered in the glowfic. Here is how it goes down in Dath Ilan:
The next stage involves a complicated dynamic-puzzle with two stations, that requires two players working simultaneously to solve. After it’s been solved, one player locks in a number on a 0-12 dial, the other player may press a button, and the puzzle station spits out jellychips thus divided.
The gotcha is, the 2-player puzzle-game isn’t always of equal difficulty for both players. Sometimes, one of them needs to work a lot harder than the other.
Now things start to heat up. There’s an obvious notion that if one player worked harder than the other, they should get more jellychips. But how much more? Can you quantify how hard the players are working, and split the jellychips in proportion to that? The game obviously seems to be pointing in the direction of quantifying how hard the players are working, relative to each other, but there’s no obvious way to do that.
Somebody proposes that each player say, on a scale of 0 to 12, how hard they felt like they worked, and then the jellychips should be divided in whatever ratio is nearest to that ratio.
The solution relies on people being honest. This is, perhaps, less of a looming unsolvable problem for dath ilani children than for adults in Golarion.
″...I don’t see how that game is any different than this one? Unless you mean there’s not the reputational element.”
They just ignore the effort difference and go for 50:50 splits. Fair over the long term, robust to deception and self-deception, low cognitive effort.
The Dath Ilani kids are wrong according to Shapley Values (confirmed as the Dath Ilan philosophy here). Let’s suppose that Aylick and Brogue are paired up on a box where Aylick had to put in three jellychips worth of effort and Brogue had to put in one jellychip worth of effort. Then their total gains from trade are 12-4=8. The Shapley division is then 4 each, which can be achieved as follows:
Aylick gets seven jellychips. Less her three units of effort, her total reward is four.
Brogue gets five jellychips. Less his one unit of effort, his total reward is four.
The Dath Ilan Child division is nine to three, which I think is only justified with the politician’s fallacy. But they are children.
AFAICT, in the Highwayman example, if the would-be robber presents his ultimatum as “give me half your silk or I burn it all,” the merchant should burn it all, same as if the robber says “give me 1% of your silk or I burn it all.” But a slightly more sophisticated highwayman might say “this is a dangerous stretch of desert, and there are many dangerous, desperate people in those dunes. I have some influence with most of the groups in the next 20 miles. For x% of your silk, I will make sure you are unmolested for that portion of your travel.” Then the merchant actually has to assign a probabilities to a bunch of events, calculate Shapley values, and roll some dice for his mixed strategy.
I think it might be as simple as not making threats against agents with compatible values.
In all of Yudkowsky’s fiction the distinction between threats (and unilateral actions removing consent from another party) and deterrence comes down to incompatible values.
The baby-eating aliens are denied access to a significant portion of the universe (a unilateral harm to them) over irreconcilable values differences. Harry Potter transfigures Voldemort away semi-permanently non-consensually because of irreconcilable values differences. Carissa and friends deny many of the gods their desired utility over value conflict.
Planecrash fleshes out the metamorality with the presumed external simulators who only enumerate the worlds satisfying enough of their values, with the negative-utilitarians having probably the strongest “threat” acausally by being more selective.
Cooperation happens where there is at least some overlap in values and so some gains from trade to be made. If there are no possible mutual gains from trade then the rational action is to defect at a per-agent cost up to the absolute value of the negative utility of letting the opposing agent achieve their own utility. Not quite a threat, but a reality about irreconcilable values.
I have also noted this tension. Intuitively, one might think that it depends on the morality of the action—the robber who threatens to blow up a bank unless he gets his money might be seen as a threat, while a policy of blowing up your own banks in case of robberies might be seen as a deterrence commitment.
However, this can not be it, because decision theory works with any utility functions.
The other idea is that that ‘to make threats’ is one of these irregular verbs. I make credible deterrence commitments, you show a willingness to escalate, they try to blackmail me with irrational threats. This is of course just as silly in the context of game theory.
One axis of difference might be if you physically restrict your own options to prevent you from not following through on your threat (like a Chicken player removing their steering wheel, or that doomsday machine from Dr. Strangelove). But this merely makes a difference for people who are known to follow causal decision theory where they would try to maximize the utility in whatever branch of reality they find themselves in. From my understanding, the adherents of functional decision theory do not need to physically constrain their options—they would be happy to burn the world in one branch of reality if that was the dominant strategy before their opponent had made their choice.
Consider the ultimatum game (which gets convered in Planecrash, naturally) where one party makes a proposal on how to split 10$ and the other party can either accept (gaining their share) or reject it (in which case neither party gains anything). In planecrash, the dominant strategy is presented as rejecting unfair allocations with some probability so that the expected value of the proposing party is lower than if they had proposed a fair split. However, this hinges on the concept of fairness. If each dollar has the same utility to every participant, then a 50-50 split seems fair. But in a more general case, the utilities of both parties might be utterly incomparable, or the effort of both players might be very different—an isomorphic situation is a silk merchant encountering the first of possibly multiple highwaymen, and having to agree on a split, with both parties having the option to burn all the silk if they don’t agree. Agreeing to a 50-50 split each time could easily make the business model of the silk merchant impossible.
“This is my strategy, and I will not change it no matter what, so you better adapt your strategy if you want to avoid fatal outcomes” is an attitude likely to lead to a lot of fatal outcomes.
Covered in the glowfic. Here is how it goes down in Dath Ilan:
And in Golarion:
They just ignore the effort difference and go for 50:50 splits. Fair over the long term, robust to deception and self-deception, low cognitive effort.
The Dath Ilani kids are wrong according to Shapley Values (confirmed as the Dath Ilan philosophy here). Let’s suppose that Aylick and Brogue are paired up on a box where Aylick had to put in three jellychips worth of effort and Brogue had to put in one jellychip worth of effort. Then their total gains from trade are 12-4=8. The Shapley division is then 4 each, which can be achieved as follows:
Aylick gets seven jellychips. Less her three units of effort, her total reward is four.
Brogue gets five jellychips. Less his one unit of effort, his total reward is four.
The Dath Ilan Child division is nine to three, which I think is only justified with the politician’s fallacy. But they are children.
AFAICT, in the Highwayman example, if the would-be robber presents his ultimatum as “give me half your silk or I burn it all,” the merchant should burn it all, same as if the robber says “give me 1% of your silk or I burn it all.”
But a slightly more sophisticated highwayman might say “this is a dangerous stretch of desert, and there are many dangerous, desperate people in those dunes. I have some influence with most of the groups in the next 20 miles. For x% of your silk, I will make sure you are unmolested for that portion of your travel.”
Then the merchant actually has to assign a probabilities to a bunch of events, calculate Shapley values, and roll some dice for his mixed strategy.
I think it might be as simple as not making threats against agents with compatible values.
In all of Yudkowsky’s fiction the distinction between threats (and unilateral actions removing consent from another party) and deterrence comes down to incompatible values.
The baby-eating aliens are denied access to a significant portion of the universe (a unilateral harm to them) over irreconcilable values differences. Harry Potter transfigures Voldemort away semi-permanently non-consensually because of irreconcilable values differences. Carissa and friends deny many of the gods their desired utility over value conflict.
Planecrash fleshes out the metamorality with the presumed external simulators who only enumerate the worlds satisfying enough of their values, with the negative-utilitarians having probably the strongest “threat” acausally by being more selective.
Cooperation happens where there is at least some overlap in values and so some gains from trade to be made. If there are no possible mutual gains from trade then the rational action is to defect at a per-agent cost up to the absolute value of the negative utility of letting the opposing agent achieve their own utility. Not quite a threat, but a reality about irreconcilable values.
IIRC, it was covered in Planecrash also!