How to Give in to Threats (without incentivizing them)

TL;DR: using a simple mixed strategy, LDT can give in to threats, ultimatums, and commitments—while incentivizing cooperation and fair[1] splits instead.

This strategy made it much more intuitive to many people I’ve talked to that smart agents probably won’t do weird everyone’s-utility-eating things like threatening each other or participating in commitment races.

1. The Ultimatum game

This part is taken from planecrash[2][3].

You’re in the Ultimatum game. You’re offered 0-10 dollars. You can accept or reject the offer. If you accept, you get what’s offered, and the offerer gets $(10-offer). If you reject, both you and the offerer get nothing.

The simplest strategy that incentivizes fair splits is to accept everything ≥ 5 and reject everything < 5. The offerer can’t do better than by offering you 5. If you accepted offers of 1, the offerer that knows this would always offer you 1 and get 9, instead of being incentivized to give you 5. Being unexploitable in the sense of incentivizing fair splits is a very important property that your strategy might have.

With the simplest strategy, if you’re offered 5..10, you get 5..10; if you’re offered 0..4, you get 0 in expectation.

Can you do better than that? What is a strategy that you could use that would get more than 0 in expectation if you’re offered 1..4, while still being unexploitable (i.e., still incentivizing splits of at least 5)?

I encourage you to stop here and try to come up with a strategy before continuing.

The solution, explained by Yudkowsky in planecrash (children split 12 jellychips, so the offers are 0..12):

When the children return the next day, the older children tell them the correct solution to the original Ultimatum Game.

It goes like this:

When somebody offers you a 7:5 split, instead of the 6:6 split that would be fair, you should accept their offer with slightly less than 67 probability. Their expected value from offering you 7:5, in this case, is 7 * slightly less than 67, or slightly less than 6. This ensures they can’t do any better by offering you an unfair split; but neither do you try to destroy all their expected value in retaliation. It could be an honest mistake, especially if the real situation is any more complicated than the original Ultimatum Game.

If they offer you 8:4, accept with probability slightly-more-less than 68, so they do even worse in their own expectation by offering you 8:4 than 7:5.

It’s not about retaliating harder, the harder they hit you with an unfair price—that point gets hammered in pretty hard to the kids, a Watcher steps in to repeat it. This setup isn’t about retaliation, it’s about what both sides have to do, to turn the problem of dividing the gains, into a matter of fairness; to create the incentive setup whereby both sides don’t expect to do any better by distorting their own estimate of what is ‘fair’.

[The next stage involves a complicated dynamic-puzzle with two stations, that requires two players working simultaneously to solve. After it’s been solved, one player locks in a number on a 0-12 dial, the other player may press a button, and the puzzle station spits out jellychips thus divided.

The gotcha is, the 2-player puzzle-game isn’t always of equal difficulty for both players. Sometimes, one of them needs to work a lot harder than the other.]

They play the 2-station video games again. There’s less anger and shouting this time. Sometimes, somebody rolls a continuous-die and then rejects somebody’s offer, but whoever gets rejected knows that they’re not being punished. Everybody is just following the Algorithm. Your notion of fairness didn’t match their notion of fairness, and they did what the Algorithm says to do in that case, but they know you didn’t mean anything by it, because they know you know they’re following the Algorithm, so they know you know you don’t have any incentive to distort your own estimate of what’s fair, so they know you weren’t trying to get away with anything, and you know they know that, and you know they’re not trying to punish you. You can already foresee the part where you’re going to be asked to play this game for longer, until fewer offers get rejected, as people learn to converge on a shared idea of what is fair.

Sometimes you offer the other kid an extra jellychip, when you’re not sure yourself, to make sure they don’t reject you. Sometimes they accept your offer and then toss a jellychip back to you, because they think you offered more than was fair. It’s not how the game would be played between dath ilan and true aliens, but it’s often how the game is played in real life. In dath ilan, that is.

This allows even very different agents with very different notions of fairness to cooperate most of the time.

So, if in the game with $0..10, you’re offered $4 instead of the fair $5, you understand that if you accept, the other player will get $6 - and so you accept with the probability of slightly less than 56, making the other player receive, in expectation, slightly less than the fair $5. You still get $4 most of the time when you’re offered this unfair split, but you’re incentivizing fair splits. Even if you’re offered $1, you accept slightly less than in 59 cases—which is more than half of the time, but still incentivizes offering you the fair 5-5 split instead.

If the other player makes a commitment to offer you $4 regardless of what you do, it simply doesn’t change what you do when you’re offered $4. You want to accept $4 with \(p=5/​6-\epsilon\) regardless of what led to this offer. Otherwise, you’ll incentivize offers of $4 instead of $5. This means other players don’t make bad commitments (and if they do, you usually give in).

(This is symmetrical. If you’re the offerer, and the other player accepts only at least $6 and always rejects $5 or lower, you can offer $6 with p=5/​6-e or otherwise offer less and be rejected.)

2. Threats, commitments, and ultimatums

You can follow the same procedure in all games. Figure out the fair split of gains, then try to coordinate on it; if the other agent is not willing to agree to the fair split and demands something else, agree to their ultimatum probabilistically, in a way that incentivizes the fair split instead.

2.1 Game of Chicken

Let’s say the payoff matrix is:

-100, −1005, −1
-1, 50, 0

Let’s assume we consider the fair split in this game to be 2, you can achieve it by coordinating on throwing a fair coin to determine who does what.

If the other player instead commits to not swerve, you calculate that if you give in, they get 5; the fair payoff is 2; so you simply give in and swerve with p=97%, making the other player get less than 2 in expectation; they would’ve done better by cooperating. Note that this decision procedure is much better than never giving in to threats—which would correspond to getting −100 every time instead of just 3% of the time—while still having the property that it’s better for everyone to not threaten you at all.

2.2 Stones

If the other player is a stone with “Threat” written on it, you should do the same thing, even if it looks like the stone’s behavior doesn’t depend on what you’ll do in response. Responding to actions and ignoring the internals when threatened means you’ll get a lot fewer stones thrown at you.

2.3 What if I don’t know the other player’s payoffs?

You want to make decisions that don’t incentivize threatening you. If you receive a threat and know nothing about the other agent’s payoffs, simply don’t give in to the threat! (If you have some information, you can transparently give in with a probability low enough that you’re certain transparently making decisions this way isn’t incentivizing this threat.)

2.4 What if the other player makes a commitment before I make any decisions?

Even without the above strategy, why would this matter? You can just make the right decisions you want to make. You can use information when you want to be using it and not use it when it doesn’t make sense to use it. The time at which you receive the information doesn’t have to be an input into what you consider if you think it doesn’t matter when you receive it.

With the above algorithm, if you receive a threat, you simply look at it and give in to it most of the time in many games, all while incentivizing not threatening you, because the other player can get more utility if they don’t threaten you.

(In reality, making decisions this way means you’ll rarely receive threats. In most games, you’ll coordinate with the other player on extracting the most utility. Agents will look at you, understand that threatening you means less utility, and you won’t have to spend time googling random number generators and probabilistically giving in. It doesn’t make sense for the other agent to make threatening commitments; and if they do, it’s slightly bad for them.

It’s never a good idea to threaten an LDT agent.)

  1. ^

    Humans might use the Shapley value, the ROSE value, or their intuitive feeling of fairness. Other agents might use very different notions of fairness.

  2. ^

    See ProjectLawful.com: Eliezer’s latest story, past 1M words.

  3. ^

    The idea of unexploitable cooperation with agents with different notions of fairness seems to have first been introduced by @Eliezer Yudkowsky in this 2013 post, with agents accepting unfair (according to them) bargains in which the other agent does worse than in the fair point on the Pareto frontier; but it didn’t suggest accepting unfair bargains probabilistically, to create new points where the other agent does just slightly worse in expectation than it would’ve in the fair point. One of the comments almost got there, but didn’t suggest adding \(-\epsilon\) to the giving-in probability, so the result was considered exploitable (as the other agent was indifferent between making a threat and accepting the fair bargain).