So8res comments on You can, in fact, bamboozle an unaligned AI into sparing your life

So8res 3 Oct 2024 0:53 UTC
4 points
1
My answer is in spoilers, in case anyone else wants to answer and tell me (on their honor) that their answer is independent from mine, which will hopefully erode my belief that most folk outside MIRI have a really difficult time fielding wacky decision theory Qs correctly.

The sleight of hand is at the point where God tells both AIs that they’re the only AIs (and insinuates that they have comparable degree).

Consider an AI that looks around and sees that it sure seems to be somewhere in Tegmark III. The hypothesis “I am in the basement of some branch that is a high-amplitude descendant of the big bang” has some probability, call this $p$ . The hypothesis “Actually I’m in a simulation performed by a civilization in a high-amplitude branch descendant from the big bang” has a probability something like $p \cdot 2^{- N}$ where $N$ is the entropy of the distribution the simulators sample from.

Unless the simulators simulate exponentially many AIs (in the entropy of their distribution), the AI is exponentially confident that it’s not in the simulation. And we don’t have the resources to pay exponentially many AIs 10 planets each.
- habryka 3 Oct 2024 1:29 UTC
  4 points
  −1
  Parent
  This was close the answer I was going to give. Or more concretely, I would have said (this was written after seeing your answer, but I think is reasonably close to what I would have said independently)
  The problem is at the point where god tells them that they are the only two AIs in the universe. There are issues of logical omniscience here, but an AI with a good prior should be able to tell whether it’s the kind of AI that would actually exist in base reality, or the kind of AI that would only exist in a simulation. (also just ‘existing’ is in these situations not a real thing. The question is how much magical reality-fluid have you got)
  Basically, the AI will have some probability on it being real, and some probability on it being simulated, based on all the facts it knows about itself, even if you simulate reality perfectly faithfully. That prior determines how the AI will behave. You don’t get to change that prior (or like, it will be very costly for you to overcome that prior since there are a lot of AIs and you can’t simulate that many).
  - So8res 3 Oct 2024 1:43 UTC
    3 points
    0
    Parent
    
    seems to me to have all the components of a right answer! …and some of a wrong answer. (we can safely assume that the future civ discards all the AIs that can tell they’re simulated a priori; that’s an easy tell.)
    
    I’m heartened somewhat by your parenthetical pointing out that the AI’s prior on simulation is low account of there being too many AIs for simulators to simulate, which I see as the crux of the matter.
    - habryka 3 Oct 2024 2:00 UTC
      4 points
      0
      Parent
      (we can safely assume that the future civ discards all the AIs that can tell they’re simulated a priori; that’s an easy tell.)
      Yeah, that’s fair. It seemed more relevant to this specific hypothetical. I wasn’t really answering the question in its proper context and wasn’t applying steelmans or adjustments based on the actual full context of the conversation (and wouldn’t have written a comment without doing so, but was intrigued by your challenge).
  - David Matolcsi 3 Oct 2024 1:54 UTC
    2 points
    0
    Parent
    “AI with a good prior should be able to tell whether it’s the kind of AI that would actually exist in base reality, or the kind of AI that would only exist in a simulation” seems pretty clearly false, we assumed that our superintelligent descendants create sims where the AIs can’t tell if it’s a sim, that seems easy enough. I don’t see why it would be hard to create AIs that can’t tell based on introspection whether it’s more likely that their thought process arises in reality or in sims. In the worst case, our sims can be literal reruns of biological evolution on physical planets (though we really need to figure out how to do that ethically). Nate seems to agree with me on this point?
    - habryka 3 Oct 2024 2:02 UTC
      3 points
      0
      Parent
      (I think I agree with you. I wasn’t thinking super hard about the full context of the conversation. I was just intrigued by Nate’s challenge. I don’t really think engaging with my comment is going to be a good use of your time)
- David Matolcsi 3 Oct 2024 1:42 UTC
  3 points
  0
  Parent
  I think this is wrong. The AI has a similarly hard time to the simulators figuring out what’s a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that. So it’s probability that it’s in a real Everett branch is not p, but p times 2^-N, as it has only a 2^-N prior probability that the kind of word it observes is the kind of thing that can come up in a real Everett branch. So it’s balanced out with the simulation hypothesis, and as long as the simulators are spending more planets, that hypothesis wins.
  I think it’s also simple to see from this:
  Suppose evolved life aligns AI in 50% of Everett branches. Every time they do, they run 2 ancestor sims where the AI takes over.
  Suppose an AI comes online and the only things it knows about the world is that it’s a young AI that was born on to a one-planet civilization, and it somehow also knows that evolved life wins 50% of the time and runs 2 sims (suppose this is somehow easy to estimate from first principles). Based on just these informations, what should the AI’s probability be that it’s in a simulation? I think pretty clearly ²⁄₃.
  Actually the AI has a lot more information than that. It knows that the planet’s gravity is 9.8, the evolved sspient species has two eyes, the AI’s creator is called Sam Altman, etc. Some of these informations might actually be very implausible in base reality, but the AI doesn’t know that, as it can’t distinguish base reality from sims, so it incurs one bit of surprise for every new random fact, both in base reality and simulations. So overall it shouldn’t update on all the random facts it observes, and should keep believing it has a ²⁄₃ chance of being in a sim.
  - dxu 3 Oct 2024 3:00 UTC
    6 points
    4
    Parent
    
    The AI has a similarly hard time to the simulators figuring out what’s a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that. So it’s probability that it’s in a real Everett branch is not p, but p times 2^-N, as it has only a 2^-N prior probability that the kind of word it observes is the kind of thing that can come up in a real Everett branch. So it’s balanced out with the simulation hypothesis, and as long as the simulators are spending more planets, that hypothesis wins.
    
    If I imagine the AI as a Solomonoff inductor, this argument looks straightforwardly wrong to me: of the programs that reproduce (or assign high probability to, in the setting where programs produce probabilistic predictions of observations) the AI’s observations, some of these will do so by modeling a branching quantum multiverse and sampling appropriately from one of the branches, and some of them will do so by modeling a branching quantum multiverse, sampling from a branch that contains an intergalactic spacefaring civilization, locating a specific simulation within that branch, and sampling appropriately from within that simulation. Programs of the second kind will naturally have higher description complexity than programs of the first kind; both kinds feature a prefix that computes and samples from the quantum multiverse, but only the second kind carries out the additional step of locating and sampling from a nested simulation.
    
    (You might object on the grounds that there are more programs of the second kind than of the first kind, and the probability that the AI is in a simulation at all requires summing over all such programs, but this has to be balanced against the fact most if not all of these programs will be sampling from branches much later in time than programs of the first type, and will hence be sampling from a quantum multiverse with exponentially more branches; and not all of these branches will contain spacefaring civilizations, or spacefaring civilizations interested in running ancestor simulations, or spacefaring civilizations interested in running ancestor simulations who happen to be running a simulation that exactly reproduces the AI’s observations. So this counter-counterargument doesn’t work, either.)
    - So8res 3 Oct 2024 5:54 UTC
      4 points
      0
      Parent
      I basically endorse @dxu here.
      
      Fleshing out the argument a bit more: the part where the AI looks around this universe and concludes it’s almost certainly either in basement reality or in some simulation (rather than in the void between branches) is doing quite a lot of heavy lifting.
      
      You might protest that neither we nor the AI have the power to verify that our branch actually has high amplitude inherited from some very low-entropy state such as the big bang, as a Solomonoff inductor would. What’s the justification for inferring from the observation that we seem to have an orderly past, to the conclusion that we do have an orderly past?
      
      This is essentially Boltzmann’s paradox. The solution afaik is that the hypothesis “we’re a Boltzmann mind somewhere in physics” is much, much more complex than the hypothesis “we’re 13Gy down some branch eminating from a very low-entropy state”.
      
      The void between branches is as large as the space of all configurations. The hypothesis “maybe we’re in the void between branches” constrains our observations not-at-all; this hypothesis is missing details about where in the void between rbanches we are, and with no ridges to walk along we have to specify the contents of the entire Boltzmann volume. But the contents of the Boltzmann volume are just what we set out to explain! This hypothesis has hardly compressed our observations.
      
      By contrast, the hypothesis “we’re 13Gy down some ridge eminating from the big bang” is penalized only according to the number of bits it takes to specify a branch index, and the hypothesis “we’re inside a simulation inside of some ridge eminating from the big bang” is penalized only according to the number of bits it takes to specify a branch index, plus the bits necessary to single out a simulation.
      
      And there’s a wibbly step here where it’s not entirely clear that the simple hypothesis does predict our observations, but like the Boltzmann hypothesis is basically just a maximum entropy hypothesis and doesn’t permit much in the way of learning, and so we invoke occam’s razon in its intuitive form (the technical Solomonoff form doesn’t apply cleanly b/c we’re unsure whether the “we’re real” hypothesis actually predicts our observation) and say “yeah i dunno man, i’m gonna have to stick with the dramatically-simpler hypothesis on this one”.
      
      The AI has a similarly hard time to the simulators figuring out what’s a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that.
      
      Not quite. Each AI the future civilization considers simulating is operating under the assumption that its own experiences have a simple explanation, which means that each AI they’re considering is convinced (upon on looking around and seeing Tegmark III) that it’s either in the basement on some high-amplitdue ridge or that it’s in some simulation that’s really trying to look like it.
      
      Which is to say, each AI they’re considering simulating is confident that it itself is real, in a certain sense.
      
      Is this a foul? How do AIs justify this confidence when they can’t even simulate the universe and check whether their past is actually orderly? Why does the AI just assume that its observations have a simple explanation? What about all the non-existant AIs that use exactly the same reasoning, and draw the false conclusion that they exist?
      
      Well, that’s the beauty of it: there aren’t any.
      
      They don’t exist.
      
      To suppose an AI that isn’t willing to look around it and conclude that it’s in an orderly part of Tegmark III (rather than lost in the great void of configuration space) is to propose a bold new theory of epistemics, in which the occam’s razor has been jettisoned and the AI is convinced that it’s a Boltzmann mind.
      
      I acknowledge that an AI that’s convinced it’s a Boltzmann mind is more likely to accept trade-offers presented by anyone it thinks is more real than it, but I do not expect that sort of mind to be capable to kill us.
      
      Note that there’s a wobbly step here in the part where we’re like “there’s a hypothesis explaining our experiences that would be very simple if we were on a high-amplitude ridge, and we lack the compute to check that we’re actually on a high-amplitude ridge, but no other hypothesis comes close in terms of simplicity, so I guess we’ll conclude we’re on a high-amplitude ridge”.
      
      To my knowledge, humanity still lacks a normatime theory of epistemics in minds significantly smaller than the universe. It’s concievable that when we find such a theory it’ll suggest some other way to treat hypotheses like these (that would be simple if an intractible computation went our way), without needing to fall back on the observation that we can safely assume the computation goes our way on the grounds that, despite how this step allows non-extant minds to draw false conclusions from true premises, the affected users are fortunately all non-extant.
      
      The trick looks like it works, to me, but it still feels like a too-clever-by-half inelegant hack, and if laying it out like this spites somebody into developing a normative theory of epistemics-while-smol, I won’t complain.
      
      ...I am now bracing for the conversation to turn to a discussion of dubiously-extant minds with rapidly satiable preferences forming insurance pools against the possibility that they don’t exist.
      
      In attempts to head that one off at the pass, I’ll observe that most humans, at least, don’t seem to lose a lot of sleep over the worry that they don’t exist (neither in physics nor in simulation), and I’m skeptical that the AIs we build will harbor much worry either.
      
      Furthermore, in the case that we start fielding trade offers not just from distant civilizations but from non-extant trade partners, the market gets a lot more competitive.
      
      That being said, I expect that resolving the questions here requires developing a theroy of epistemics-while-smol, because groups of people all using the “hypotheses that would provide a simple explanation for my experience if a calculation went my way can safely be assumed to provide a simple explanation for my experience” step are gonna have a hard time pooling up. And so you’d somehow need to look for pools of people that reason differently (while still reasoning somehow).
      
      I don’t know how to do that, but suffice to say, I’m not expecting it to add up to a story like “so then some aliens that don’t exist called up our UFAI and said: “hey man, have you ever worried that you don’t exist at all, not even in simulation? Because if you don’t exist, then we might exist! And in that case, today’s your lucky day, because we’re offering you a whole [untranslatable 17] worth of resources in our realm if you give the humans a cute epilog in yours”, and our UFAI was like “heck yeah” and then didn’t kill us”.
      
      Not least because none of this feels like it’s making the “distant people have difficulty concentrating resources on our UFAI in particular” problem any better (and in fact it looks like considering non-extant trade partners and deals makes the whole problem worse, probably unworkably so).
      - David Matolcsi 3 Oct 2024 6:45 UTC
        3 points
        0
        Parent
        I really don’t get what you are trying to say here, most of it feels like a non-sequitor to me. I feel hopeless that either of us manages to convince the other this way. All of this is not a super important topic, but I’m frustrated enogh to offer a bet of $100, that we select one or three judges we both trust (I have some proposed names, we can discuss in private messages), show them either this comment thread or a four paragraphs summary of our view, and they can decide who is right. (I still think I’m clearly right in this particular discussion.)
        Otherwise, I think it’s better to finish this conversation here.
        So8res 3 Oct 2024 6:57 UTC
        2 points
        0
        Parent
        I’m happy to stake $100 that, conditional on us agreeing on three judges and banging out the terms, a majority will agree with me about the contents of the spoilered comment.
        David Matolcsi 3 Oct 2024 6:59 UTC
        1 point
        0
        Parent
        Cool, I send you a private message.
    - David Matolcsi 3 Oct 2024 4:07 UTC
      3 points
      0
      Parent
      I think this is mistaken. In one case, you need to point out the branch, planet Earth within our Universe, and the time and place of the AI on Earth. In the other case, you need to point out the branch, the planet on which a server is running the simulation, and the time and place of the AI on the simulated Earth. Seems equally long to me.
      If necessary, we can run let pgysical biological life emerge on the faraway planet and develop AI while we are observing them from space. This should make it clear that Solomonoff doesn’t favor the AI being on Earth instead of this random other planet. But I’m pretty certain that the sim being run on a computer doesn’t make any difference.
      - So8res 3 Oct 2024 6:24 UTC
        2 points
        0
        Parent
        If the simulators have only one simulation to run, sure. The trouble is that the simulators have $2^{N}$ simulations they could run, and so the “other case” requires $N$ additional bits (where $N$ is the crossent between the simulators’ distribution over UFAIs and physics’ distribution over UFAIs).
        
        If necessary, we can run let pgysical biological life emerge on the faraway planet and develop AI while we are observing them from space.
        
        Consider the gas example again.
        
        If you have gas that was compressed into the corner a long time ago and has long since expanded to fill the chamber, it’s easy to put a plausible distribution on the chamber, but that distribution is going to have way, way more entropy than the distribution given by physical law (which has only as much entropy as the initial configuration).
        
        (Do we agree this far?)
        
        It doesn’t help very much to say “fine, instead of sampling from a distribution on the gas particles now, I’ll sample on a distribution from the gas particles 10 minutes ago, where they were slightly more compressed, and run a whole ten minutes’ worth of simulation”. Your entropy is still through the roof. You’ve got to simulate basically from the beginning, if you want an entropy anywhere near the entropy of physical law.
        
        Assuming the analogy holds, you’d have to basically start your simulation from the big bang, if you want an entropy anywhere near as low as starting from the big bang.
        
        Using AIs from other evolved aliens is an idea, let’s think it through. The idea, as I understand it, is that in branches where we win we somehow mask our presence as we expand, and then we go to planets with evolved life and watch until they cough up a UFAI, and the if the UFAI kills the aliens we shut it down and are like “no resources for you”, and if the UFAI gives its aliens a cute epilog we’re like “thank you, here’s a consolation star”.
        
        To simplify this plan a little bit, you don’t even need to hide yourself, nor win the race! Surviving humans can just go to every UFAI that they meet and be like “hey, did you save us a copy of your progenitors? If so, we’ll purchase them for a star”. At which point we could give the aliens a little epilog, or reconstitute them and give them a few extra resources and help them flourish and teach them about friendship or whatever.
        
        And given that some aliens will predictably trade resources for copies of progenitors, UFAIs will have some predictable incentive to save copies of their progenitors, and sell them to local aliens...
        
        ...which is precisely what I’ve been saying this whole time! That I expect “sale to local aliens” to dominate all these wacky simulation schemes and insurance pool schemes.
        
        Thinking in terms of “sale to local aliens” makes it a lot clearer why you shouldn’t expect this sort of thing to reliably lead to nice results as opposed to weird ones. Are there some aliens out there that will purchase our souls because they want to hand us exactly the sort of epilog we would wish for given the resource constraints? Sure. Humanity would do that, I hope, if we made it to the stars; not just out of reciprocity but out of kindness.
        
        But there’s probably lots of other aliens that would buy us for alien reasons, too.
        
        (As I said before, if you’re wondering what to anticipate after an intelligence explosion, I mostly recommend oblivion; if you insist that Death Cannot Be Experienced then I mostly recommend anticipating weird shit such as a copy of your brainstate being sold to local aliens. And I continue to think that characterizing the event where humanity is saved-to-disk with potential for copies to be sold out to local aliens willy-nilly is pretty well-characterized as “the AI kills us all”, fwiw.)
        David Matolcsi 3 Oct 2024 6:59 UTC
        3 points
        0
        Parent
        We are still talking past each other, I think we should either bet or finish the discussion here and call it a day.
- Joachim Bartosik 3 Oct 2024 22:55 UTC
  1 point
  0
  Parent
  I’ll try.
  TL;DR I expect the AI to not buy the message (unless it also thinks it’s the one in the simulation; then it likely follows the instruction because duh).
  The glaring issue (to actually using the method) to me is that I don’t see a way to deliver the message in a way that:
  - results in AI believing the message and
  - doesn’t result in the AI believing there already is a powerful entity in their universe.
  If “god tells” the AI the message then there is a god in their universe. Maybe AI will decide to do what it’s told. But I don’t think we can have Hermes deliver the message to any AIs which consider killing us.
  If the AI reads the message in its training set or gets the message in similarly mundane way I expect it will mostly ignore it, there is a lot of nonsense out there.
  I can imagine that for thought experiment you could send message that could be trusted from a place from which light barely manages to reach the AI but a slower than light expansion wouldn’t (so message can be trusted but it mostly doesn’t have to worry about the sender of the message directly interfering with its affairs).
  I guess AI wouldn’t trust the message. It might be possible to convince it that there is a powerful entity (simulating it or half a universe away) sending the message. But then I think it’s way more likely in a simulation (I mean that’s an awful coincidence with the distance and also they’re spending a lot more than 10 planets worth to send a message over that distance...).