So8res comments on You can, in fact, bamboozle an unaligned AI into sparing your life

So8res 3 Oct 2024 19:56 UTC
19 points
2
Dávid graciously proposed a bet, and while we were attempting to bang out details, he convinced me of two points:

The entropy of the simulators’ distribution need not be more than the entropy of the (square of the) wave function in any relevant sense. Despite the fact that subjective entropy may be huge, physical entropy is still low (because the simulations happen on a high-amplitude ridge of the wave function, after all). Furthermore, in the limit, simulators could probably just keep an eye out for local evolved life forms in their domain and wait until one of them is about to launch a UFAI and use that as their “sample”. Local aliens don’t necessarily exist and your presence can’t necessarily be cheaply masked, but we could imagine worlds where both happen and that’s enough to carry the argument, as in this case the entropy of the simulator’s distribution is actually quite close to the physical entropy. Even in the case where the entropy of their distribution is quite large, so long as the simulators’ simulations are compelling, UFAIs should be willing to accept the simulators’ proffered trades (at least so long as there is no predictable-to-them difference in the values of AIs sampled from physics an sampled from the simulations), on the grounds that UFAIs on net wind up with control over a larger fraction of Tegmark III that way (and thus each individual UFAI winds up with more control in expectation, assuming it cannot find any way to distinguish which case it’s in).

This has not updated me away from my underlying point that this whole setup simplifies to the case of sale to local aliens^[1]^[2], but I do concede that my “you’re in trouble if simulators can’t concentrate their probability-mass on real AIs” argument is irrelevant on the grounds of false antecedent (and that my guess in the comment was wrong), and that my “there’s a problem where simulators cannot concentrate their probability-mass into sufficiently real AI” argument was straightforwardly incorrect. (Thanks, Dávid, for the corrections.)
1. ↩︎
  I now think that the first half of the argument in the linked comment is wrong, though I still endorse the second half.
  
  To see the simplification: note that the part where the simulators hide themselves from a local UFAI to make the scenario a “simulation” is not pulling weight. Instead of hiding and then paying the AI two stars if it gave one star to its progenitors, simulators could instead reveal ourselves and purchase its progenitors for 1 star and then give them a second star. Same result, less cruft (so long as this is predictably the sort of thing an alien might purchase, such that AIs save copies of their progenitors).
2. ↩︎
  Recapitulating some further discussion I had with Dávid in our private doc: once we’ve reduced the situation to “sale to local aliens” it’s easier to see why this is an argument to expect whatever future we get to be weird rather than nice. Are there some aliens out there that would purchase us and give us something nice out of a sense of reciprocity? Sure. But when humans are like “well, we’d purchase the aliens killed by other UFAIs and give them nice things and teach them the meaning of friendship”, this statement is not usually conditional on some clause like “if and only if, upon extrapolating what civilization they would have become if they hadn’t killed themselves, we see that they would have done the same for us (if we’d’ve done the same for them etc.)”, which sure makes it look like this impulse is coming out of a place of cosmopolitan value rather than of binding trade agreements, which sure makes it seem like alien whim is a pretty big contender relative to alien contracts.
  
  Which is to say, I still think the “sale to local aliens” frame yields better-calibrated intuitions for who’s doing the purchasing, and for what purpose. Nevertheless, I concede that the share of aliens acting out of contractual obligation rather than according to whim is not vanishingly small, as my previous arguments erroneously implied.
- David Matolcsi 3 Oct 2024 20:04 UTC
  11 points
  0
  Parent
  Thanks to Nate for conceding this point.
  
  I still think that other than just buying freedom to doomed aliens, we should run some non-evolved simulations of our own with inhabitants that are preferably p-zombies or animated by outside actors. If we can do this in the way that the AI doesn’t notice it’s in a simulation (I think this should be doable), this will provide evidence to the AI that civilizations do this simulation game (and not just the alien-buying) in general, and this buys us some safety in worlds where the AI eventually notices there are no friendly aliens in our reachable Universe. But maybe this is not a super important disagreement.
  Altogether, I think the private discussion with Nate went really well and it was significantly more productive than the comment back-and-forth we were doing here. In general, I recommend people stuck in interminable-looking debates like this to propose bets on whom a panel of judges will deem right. Even though we didn’t get to the point of actually running the bet, as Nate conceded the point before that, I think the fact that we were optimizing for having well-articulated statements we can submit to judges already made the conversation much more productive.
- dxu 7 Oct 2024 21:27 UTC
  4 points
  2
  Parent
  I think I might be missing something, because the argument you attribute to Dávid still looks wrong to me. You say:
  
  The entropy of the simulators’ distribution need not be more than the entropy of the (square of the) wave function in any relevant sense. Despite the fact that subjective entropy may be huge, physical entropy is still low (because the simulations happen on a high-amplitude ridge of the wave function, after all).
  
  Doesn’t this argument imply that the supermajority of simulations within the simulators’ subjective distribution over universe histories are not instantiated anywhere within the quantum multiverse?
  
  I think it does. And, if you accept this, then (unless for some reason you think the simulators’ choice of which histories to instantiate is biased towards histories that correspond to other “high-amplitude ridges” of the wave function, which makes no sense because any such bias should have already been encoded within the simulators’ subjective distribution over universe histories) you should also expect, a priori, that the simulations instantiated by the simulators should not be indistinguishable from physical reality, because such simulations comprise a vanishingly small proportion of the simulators’ subjective probability distribution over universe histories.
  
  What this in turn means, however, is that prior to observation, a Solomonoff inductor (SI) must spread out much of its own subjective probability mass across hypotheses that predict finding itself within a noticeably simulated environment. Those are among the possibilities it must take into account—meaning, if you stipulate that it doesn’t find itself in an environment corresponding to any of those hypotheses, you’ve ruled out all of the “high-amplitude ridges” corresponding to instantiated simulations in the crossent of the simulators’ subjective distribution and reality’s distribution.
  
  We can make this very stark: suppose our SI finds itself in an environment which, according to its prior over the quantum multiverse, corresponds to one high-amplitude ridge of the physical wave function, and zero high-amplitude ridges containing simulators that happened to instantiate that exact environment (either because no branches of the quantum multiverse happened to give rise to simulators that would have instantiated that environment, or because the environment in question simply wasn’t a member of any simulators’ subjective distributions over reality to begin with). Then the SI would immediately (correctly) conclude that it cannot be in a simulation.
  
  Now, of course, the argument as I’ve presented it here is heavily reliant on the idea of our SI being an SI, in such a way that it’s not clear how exactly the argument carries over to the logically non-omniscient case. In particular, it relies on the SI being capable of discerning differences between very good simulations and perfect simulations, a feat which bounded reasoners cannot replicate; and it relies on the notion that our inability as bounded reasoners to distinguish between hypotheses at this level of granularity is best modeled in the SI case by stipulating that the SI’s actual observations are in fact consistent with its being instantiated within a base-level, high-amplitude ridge of the physical wave function—i.e. that our subjective inability to tell whether we’re in a simulation should be viewed as analogous to an SI being unable to tell whether it’s in a simulation because its observations actually fail to distinguish. I think this is the relevant analogy, but I’m open to being told (by you or by Dávid) why I’m wrong.
  - So8res 7 Oct 2024 22:21 UTC
    2 points
    0
    Parent
    I agree that in real life the entropy argument is an argument in favor of it being actually pretty hard to fool a superintelligence into thinking it might be early in Tegmark III when it’s not (even if you yourself are a superintelligence, unless you’re doing a huge amount of intercepting its internal sanity checks (which puts significant strain on the trade possibilities and which flirts with being a technical-threat)). And I agree that if you can’t fool a superintelligence into thinking it might be early in Tegmark III when it’s not, then the purchasing power of simulators drops dramatically, except in cases where they’re trolling local aliens. (But the point seems basically moot, as ‘troll local aliens’ is still an option, and so afaict this does all essentially iron out to “maybe we’ll get sold to aliens”.)