Wei Dai comments on Geometric Rationality is Not VNM Rational

Wei Dai 28 Nov 2022 1:44 UTC
32 points
22
This reminds me of an example I described in this SL4 post:

After suggesting in a previous post [1] that AIs who want to cooperate with each other may find it more efficient to merge than to trade, I realized that voluntary mergers do not necessarily preserve Bayesian rationality, that is, rationality as defined by standard decision theory. In other words, two “rational” AIs may find themselves in a situation where they won’t voluntarily merge into a “rational” AI, but can agree merge into an “irrational” one. This seems to suggest that we shouldn’t expect AIs to be constrained by Bayesian rationality, and that we need an expanded definition of what rationality is.

Let me give a couple of examples to illustrate my point. First consider an AI with the only goal of turning the universe into paperclips, and another one with the goal of turning the universe into staples. Each AI is programmed to get 1 util if at least 60% of the accessible universe is converted into its target item, and 0 utils otherwise. Clearly they can’t both reach their goals (assuming their definitions of “accessible universe” overlap sufficiently), but they are not playing a zero-sum game, since it is possible for them to both lose, if for example they start a destructive war that devastates both of them, or if they just each convert 50% of the universe.

So what should they do? In [1] I suggested that two AIs can create a third AI whose utility function is a linear combination of the utilities of the original AIs, and then hand off their assets to the new AI. But that doesn’t work in this case. If they tried this, the new AI will get 1 util if at least 60% of the universe is converted to paperclips, and 1 util if at least 60% of the universe is converted to staples. In order to maximize its expected utility, it will pursue the one goal with the highest chance of success (even if it’s just slightly higher than the other goal). But if these success probabilities were known before the merger, the AI whose goal has a smaller chance of success would have refused to agree to the merger. That AI should only agree if the merger allows it to have a close to 50% probability of success according to its original utility function.

The problem here is that standard decision theory does not allow a probabilistic mixture of outcomes to have a higher utility than the mixture’s expected utility, so a ⁵⁰⁄₅₀ chance of reaching either of two goals A and B cannot have a higher utility than 100% chance of reaching A and a higher utility than 100% chance of reaching B, but that is what is needed in this case in order for both AIs to agree to the merger.
What links here?
- Geometric Utilitarianism (And Why It Matters) by StrivingForLegibility (12 May 2024 3:41 UTC; 26 points)
- cousin_it 19 Dec 2022 12:49 UTC
  3 points
  2
  Parent
  I remember my reaction when first reading this was “both AIs delegate their power, then a jointly trusted coinflip is made, then a new AI is constructed which maximizes one of the utility functions”. That seems to solve the problem in general.
- ESRogs 28 Nov 2022 19:35 UTC
  2 points
  0
  Parent
  But if these success probabilities were known before the merger, the AI whose goal has a smaller chance of success would have refused to agree to the merger. That AI should only agree if the merger allows it to have a close to 50% probability of success according to its original utility function.
  Why does the probability need to be close to 50% for the AI to agree to the merger? Shouldn’t its threshold for agreeing to the merger depend on how likely one or the other AI is to beat the other in a war for the accessible universe?
  Is there an assumption that the two AIs are roughly equally powerful, and that a both-lose scenario is relatively unlikely?
  - Slider 28 Nov 2022 21:18 UTC
    4 points
    0
    Parent
    It is first past the post, minorities get nothing. There might be an implicit assumption that the created new agent agrees with probablities with the old agents. 49% plausible papperclips, 51% plausible staples will act 100% staple and does not serve at all for paperclips.
    - ESRogs 29 Nov 2022 7:27 UTC
      2 points
      2
      Parent
      Ah, maybe the way to think about it is that if I think I have a 30% chance of success before the merger, then I need to have a 30%+epsilon chance of my goal being chosen after the merger. And my goal will only be chosen if it is estimated to have the higher chance of success.
      And so, if we assume that the chosen goal is def going to succeed post-merger (since there’s no destructive war), that means I need to have a 30%+epsilon chance that my goal has a >50% chance of success post-merger. Or in other words “a close to 50% probability of success”, just as Wei said.