Many such intuitions seem to rely on “doors” between worlds. That makes sense—if we have two rooms of animals connected by a door, then killing all animals in one room will just lead to it getting repopulated from the other room, which is better than killing all animals in both rooms with probability 1⁄2. So in that case there’s indeed a difference between the two kinds of risk.
The question is, how likely is a door between two Everett branches, vs. a door connecting a possible world with an impossible world? With current tech, both are impossible. With sci-fi tech, both could be possible, and based on the same principle (simulating whatever is on the other side of the door). But maybe “quantum doors” are more likely than “logical doors” for some reason?
Another argument that definitely doesn’t rely on any sort of “doors” for why physical risk might be preferable to logical risk is just if you have diminishing returns on the total number of happy humans. As long as your returns to happy humans are sublinear (logarithmic is a standard approximation, though anything sublinear works), then you should prefer a guaranteed shot at 12 the Everett branches having lots of happy humans to a 12 chance of all the Everett branches having happy humans. To see this, suppose U:N→R measures your returns to the total number of happy humans across all Everett branches. Let N be the total number of happy humans in a good Everett branch and M the total number of Everett branches. Then, in the physical risk situation, you get
Uphysical risk=U⎛⎜
⎜⎝M2∑i=1N⎞⎟
⎟⎠=U(MN2)
whereas, in the logical risk situation, you get
Ulogical risk=12U(0)+12U(M∑i=1N)=12U(MN)
which are only equal if U is linear. Personally, I think my returns are sublinear, since I pretty strongly want there to at least be some humans—more strongly than I want there to be more humans, though I want that as well. Furthermore, if you believe there’s a chance that the universe is infinite, then you should probably be using some sort of measure over happy humans rather than just counting the number, and my best guess for what such a measure might look like seems to be at least somewhat locally sublinear.
So you’re saying that (for example) there could be a very large universe that is running simulations of both possible worlds and impossible worlds, and therefore even if we go extinct in all possible worlds, versions of us that live in the impossible worlds could escape into the base universe so the effect of a logical risk would be similar to a physical risk of equal magnitude (if we get most of our utility from controlling/influencing such base universes). Am I understanding you correctly?
If so, I have two objections to this. 1) Some impossible worlds seem impossible to simulate. For example suppose in the actual world AI safety requires solving metaphilosophy. How would you simulate an impossible world in which AI safety doesn’t require solving metaphilosophy? 2) Even for the impossible worlds that maybe can be simulated (e.g., where the trillionth digit of pi is different from what it actually is) it seems that only a subset of reasons for running simulations of possible worlds would apply to impossible worlds, so I’m a lot less sure that “logical doors” exist than I am that “quantum doors” exist.
It seems to me that AI will need to think about impossible worlds anyway—for counterfactuals, logical uncertainty, and logical updatelessness/trade. That includes worlds that are hard to simulate, e.g. “what if I try researching theory X and it turns out to be useless for goal Y?” So “logical doors” aren’t that unlikely.
Many such intuitions seem to rely on “doors” between worlds. That makes sense—if we have two rooms of animals connected by a door, then killing all animals in one room will just lead to it getting repopulated from the other room, which is better than killing all animals in both rooms with probability 1⁄2. So in that case there’s indeed a difference between the two kinds of risk.
The question is, how likely is a door between two Everett branches, vs. a door connecting a possible world with an impossible world? With current tech, both are impossible. With sci-fi tech, both could be possible, and based on the same principle (simulating whatever is on the other side of the door). But maybe “quantum doors” are more likely than “logical doors” for some reason?
Another argument that definitely doesn’t rely on any sort of “doors” for why physical risk might be preferable to logical risk is just if you have diminishing returns on the total number of happy humans. As long as your returns to happy humans are sublinear (logarithmic is a standard approximation, though anything sublinear works), then you should prefer a guaranteed shot at 12 the Everett branches having lots of happy humans to a 12 chance of all the Everett branches having happy humans. To see this, suppose U:N→R measures your returns to the total number of happy humans across all Everett branches. Let N be the total number of happy humans in a good Everett branch and M the total number of Everett branches. Then, in the physical risk situation, you get Uphysical risk=U⎛⎜ ⎜⎝M2∑i=1N⎞⎟ ⎟⎠=U(MN2) whereas, in the logical risk situation, you get Ulogical risk=12U(0)+12U(M∑i=1N)=12U(MN) which are only equal if U is linear. Personally, I think my returns are sublinear, since I pretty strongly want there to at least be some humans—more strongly than I want there to be more humans, though I want that as well. Furthermore, if you believe there’s a chance that the universe is infinite, then you should probably be using some sort of measure over happy humans rather than just counting the number, and my best guess for what such a measure might look like seems to be at least somewhat locally sublinear.
So you’re saying that (for example) there could be a very large universe that is running simulations of both possible worlds and impossible worlds, and therefore even if we go extinct in all possible worlds, versions of us that live in the impossible worlds could escape into the base universe so the effect of a logical risk would be similar to a physical risk of equal magnitude (if we get most of our utility from controlling/influencing such base universes). Am I understanding you correctly?
If so, I have two objections to this. 1) Some impossible worlds seem impossible to simulate. For example suppose in the actual world AI safety requires solving metaphilosophy. How would you simulate an impossible world in which AI safety doesn’t require solving metaphilosophy? 2) Even for the impossible worlds that maybe can be simulated (e.g., where the trillionth digit of pi is different from what it actually is) it seems that only a subset of reasons for running simulations of possible worlds would apply to impossible worlds, so I’m a lot less sure that “logical doors” exist than I am that “quantum doors” exist.
It seems to me that AI will need to think about impossible worlds anyway—for counterfactuals, logical uncertainty, and logical updatelessness/trade. That includes worlds that are hard to simulate, e.g. “what if I try researching theory X and it turns out to be useless for goal Y?” So “logical doors” aren’t that unlikely.