The Geometric Importance of Side Payments

I’m generally a fan of “maximize economic surplus and then split the benefits fairly”. And I think this approach makes the most sense in contexts where agents are bargaining over a joint action space , where is some object-level decision being made and are side-payments that agents can use to transfer value between them.[1]

An example would be a negotiation between Alice and Bob over how to split a pile of 100 tokens, which Alice can exchange for $0.01 each, and Bob can exchange for $10,000,000 each. The sort of situation where there’s a real and interpersonally comparable difference in the value they each derive from their least and most favorite outcome.[2]

In this example is the convex set containing joint utilities for all splits of 100 tokens, and the ($0, $0) disagreement point. If we take , the Nash and KS bargaining solutions are for Alice and Bob to each receive 50 tokens. But this is clearly not actually Pareto optimal. Pareto optimality looks like enacting a binding agreement between Alice and Bob that “Bob can have all the tokens, and Alice receives a fair split of the money”. And I claim the mistake was in modelling as the full set of feasible options, when in fact the world around us redunds with opportunities to do better.

Side payments introduce important geometric information that alone doesn’t convey: the real-world tradeoff between making Alice happier and making Bob happier. Bargaining solutions are rightly designed to ignore how utility functions are shifted and scaled, and when is compact we can standardize each agent’s utility into [0, 1]. With alone, we can’t distinguish between “Bob is just using bigger numbers to measure his utility” (measuring this standardized shape in nano-utilons) and “Bob is actually 1 billion times more sensitive to the difference between his least and most favorite outcome than Alice is.”

In this example, when we project the outcome space into standardized joint utility space, the results for and look pretty similar: a curve sloping down from Bob’s favorite outcome to Alice’s, and all the space between that curve and (0, 0). And the Nash and KS bargaining solutions will be the same: 0.5 standardized utilons for each. But when we reverse the projection to find outcomes with this joint utility, for we find (50 tokens for Alice, 50 tokens for Bob), and for we find ((0 tokens for Alice, 100 tokens for Bob), Bob gives Alice $500,000,000).

Economists call “the resource used measure value” the numéraire, and usually this unit of caring is money. If we can find or invent a resource that Alice and Bob both value linearly, economists say that they have quasilinear utility functions, which is amazing news. They can use this resource for side payments, it simplifies a lot of calculations, and it also causes agreement among many different ways we might try to measure surplus.

When Alice and Bob each have enough of this resource to pay for any movement across D, then the Pareto frontier of becomes completely flat. And whenever this happens to a Pareto frontier, the Nash and KS bargaining solutions coincide exactly with “maximize economic surplus and split it equally.”

“Maximize total utility” and “maximize average utility” are type errors if we interpret them literally. But “maximize economic surplus (and split it fairly)” is something we can do, using tools like trade and side payments to establish a common currency for surplus measurement.

Using Weights as Side Payments

Money is a pretty reasonable type of side-payment, but we could also let agents transfer weights in the joint utility function among themselves. This is the approach Andrew Critch explores in his excellent paper on Negotiable Reinforcement Learning, in which a single RL agent is asked to balance between the interests of multiple principals with different beliefs. The overall agent is an maximizer, where the Harsanyi weights shift according to Bayes rule, giving better predictors more weight in future decisions. The principals essentially bet about the next observation the RL agent will make, where the stakes are denominated in .

One direction Andrew points towards for future work is using some kind of bargaining among sub-agents to determine what the overall agent does. One way to model this is by swapping out maximization for maximization, defining each agent’s baseline if no trade takes place, and enriching to include side payments.

  1. ^

    This can also be framed as picking a point on the Pareto frontier, and then letting agents pay each other for small shifts from there. Bargaining over combines these into a single step.

  2. ^

    How do I know utilities can be compared? Exactly because when Bob offers Alice $5,000,000 for one of her tokens, she says “yep that sounds good to me!” Money is the unit of caring.