ryan_greenblatt comments on Towards a scale-free theory of intelligent agency

ryan_greenblatt 22 Mar 2025 0:54 UTC
LW: 22 AF: 13
2
AF
However, this strongly limits the space of possible aggregated agents. Imagine two EUMs, Alice and Bob, whose utilities are each linear in how much cake they have. Suppose they’re trying to form a new EUM whose utility function is a weighted average of their utility functions. Then they’d only have three options:
- Form an EUM which would give Alice all the cakes (because it weights Alice’s utility higher than Bob’s)
- Form an EUM which would give Bob all the cakes (because it weights Bob’s utility higher than Alice’s)
- Form an EUM which is totally indifferent about the cake allocation between them (which would allocate cakes arbitrarily, and could be swayed by the tiniest incentive to give all Alice’s cakes to Bob, or vice versa)
None of these is very satisfactory!
I think this exact example is failing to really inhabit the mindset of a true linear(!) returns EUM agent. If Alice has literally linear returns Alice is totally happy to accept a deal which gets Alice 2x as many cakes + epsilon in 50% of worlds and nothing otherwise.

Correspondingly, if Alice and Bob have ex-ante exactly identical expected power and it is ex-ante as easy to make cake for then I think the agent they would build together would be something like:
- Form an EUM which is totally indifferent about the cake allocation between them and thus gives 100% of the cake to whichever agent is cheaper/easier to provide cake for.
From Alice’s perspective this gets twice as many cakes + epsilon (due to being more efficient) in 50% of worlds and is thus a nice trade.

(If the marginal cost of giving a cake to Alice vs Bob increases with number of cakes, then you’d give some to both.)

If Alice/Bob had dimishing returns, then adding the utility functions with some bargained weighting is also totally fine and will get you some nice split of cake between them.

If we keep their preferences, but make them have different cake production abilities or marginal costs of providing cakes for them, then you just change the weights (based on some negotiation), not the linearity of the addition. And yes, this means that in many worlds (where one agent always has lower than ex-ante relative marginal cake consumption cost), one of the agents gets all the cake. But ex-ante they got a bit more in expectation!

I’m much more sympathetic to other objections to aggregations of EUM agents being EUM, like ontology issues, imperfect information (and adverse selection), etc.
- Richard_Ngo 22 Mar 2025 4:02 UTC
  LW: 6 AF: 5
  2
  AF Parent
  I was a bit lazy in how I phrased this. I agree with all your points; the thing I’m trying to get at is that this approach falls apart quickly if we make the bargaining even slightly less idealized. E.g. your suggestion “Form an EUM which is totally indifferent about the cake allocation between them and thus gives 100% of the cake to whichever agent is cheaper/easier to provide cake for”:
  1. Strongly incentivizes deception (including self-deception) during bargaining (e.g. each agent wants to overstate the difficulty of providing cake for it).
  2. Strongly incentivizes defection from the deal once one of the agents realize that they’ll get no cake going forward.
  3. Is non-robust to multi-agent dynamics (e.g. what if one of Alice’s allies later decides “actually I’m going to sell pies to the Alice+Bob coalition more cheaply if Alice gets to eat them”? Does that then divert Bob’s resources towards buying cakes for Alice?)
  EUM treats these as messy details. Coalitional agency treats them as hints that EUM is missing something.
  EDIT: another thing I glossed over is that IIUC Harsanyi’s theorem says the aggregation of EUMs should have a weighted average of utilities, NOT a probability distribution over weighted averages of utilities. So even flipping a coin isn’t technically kosher. This may seem nitpicky but I think it’s yet another illustration of the underlying non-robustness of EUM.
  - Richard_Ngo 22 Mar 2025 5:07 UTC
    LW: 6 AF: 5
    0
    AF Parent
    I’ve now edited that section. Old version and new version here for posterity.
    Old version:
    None of these is very satisfactory! Intuitively speaking, Alice and Bob want to come to an agreement where respect for both of their interests is built in. For example, they might want the EUM they form to value fairness between their two original sets of interests. But adding this new value is not possible if they’re limited to weighted averages. The best they can do is to agree on a probabilistic mixture of EUMs—e.g. tossing a coin to decide between option 1 and option 2—which is still very inflexible, since it locks in one of them having priority indefinitely.
    Based on similar reasoning, Scott Garrabrant rejects the independence axiom. He argues that the axiom is unjustified because rational agents should be able to follow through on commitments they made about which decision procedure to follow (or even hypothetical commitments).
    New version:
    These are all very unsatisfactory. Bob wouldn’t want #1, Alice wouldn’t want #2, and #3 is extremely non-robust. Alice and Bob could toss a coin to decide between options #1 and #2, but then they wouldn’t be acting as an EUM (since EUMs can’t prefer a probabilistic mixture of two options to either option individually). And even if they do, whoever loses the coin toss will have a strong incentive to renege on the deal.
    We could see these issues merely as the type of frictions that plague any idealized theory. But we could also seem them as hints about what EUM is getting wrong on a more fundamental level. Intuitively speaking, the problem here is that there’s no mechanism for separately respecting the interests of Alice and Bob after they’ve aggregated into a single agent. For example, they might want the EUM they form to value fairness between their two original sets of interests. But adding this new value is not possible if they’re limited to (a probability distribution over) weighted averages of their utilities. This makes aggregation very risky when Alice and Bob can’t consider all possibilities in advance (i.e. in all realistic settings).
    Based on similar reasoning, Scott Garrabrant rejects the independence axiom. He argues that the axiom is unjustified because rational agents should be able to lock in values like fairness based on prior agreements (or even hypothetical agreements).