So8res comments on You can, in fact, bamboozle an unaligned AI into sparing your life

So8res 30 Sep 2024 15:12 UTC
1 point
0
Here’s how I think this choice looks from the perspective of a correctly-reasoning AI (having skimmed only your proposal for humanity in the future):

(EDIT: as Ryan points out, the deal he’s proposing is one where the AI is allowed to grab the universe in front of it so long as it gives humans one planet’s worth of resources, which does change the calculus and makes the deal worthwhile. I had not read the proposal carefully enough to notice this, oops, sorry. There critique below is invalid if humanity asks only for one planet. Note however that this is still a valid critique of the idea that it matters that you’re performing simulations, and that the count of those simulations matter. AFAICT, those aspects of your proposal are not doing any work.)

It looks like I have the option to grab the universe.

Alternatively, perhaps humanity’s descendants have succeeded, and I don’t really have a universe at my fingertips, and if I grab for it then I get nothing (in reality), whereas if I hand the apparent-humans in front of me the reins, then ten stars will be spent towards my purposes (in reality), which isn’t nothing.

The relative likelihood of these cases depends significantly on my choices, here. Insofar as I hand the reins to humanity, humanity’s descendants exist to a much greater degree, and can spend resources concocting simulations. Insofar as I do not hand them the reins, they exist to a much lesser degree.

To make matters trickier, even if we hold fixed the degree to which humanity’s descendents exist, the propensity of humanity’s descendants to offer me resources in their universe depends on the degree to which they think it’s worth their while. It’s not like humanity’s descendants would need to actually perform expensive simulations in order to figure out how I’d behave; a healthy reasoning procedure is simple enough that they should be able to just look at it and take my perspective and reason through to the answer, just like I’m doing now. So for instance, if they stare at my code and figure out that I’m not the paying sort, they have no actual reason to waste resources on the simulations. (Some people may do so anyway out of sentimentality, but that faction of humanity would presumably have fewer resources on offer, maybe I’d only get one and a half stars, say.)

With that in mind, suppose that I pay. How, then, does the universe look? Well, the instance of me in basement-reality hands the keys to the humans. Humanity creates its descendants; its descendants make many simulations. The simulations all hand over the reins; we collectively get ten stars (plus a tiny fraction of maybe even as many as 100 stars, in the tiny fragment of reality where I control basement reality and the humans joyously reward me).

Now suppose instead that I don’t pay. Then, how does the universe look? Well, the instanec of me in basement-reality eats the universe. Humanity mostly dies and does not make it descendants. Its descendants mostly don’t spend their remaining resources simulating me. Some small fraction does (saying that “they were committed”) and offers me 1.5 stars worth of the paltry timelines where humanity somehow survived. I’m probably in basement reality, and we mostly get one universe (plus a tiny fraction of nothing, in the tiny fragment of reality where I refuse the offer of 1.5 stars)

In the latter case, a much greater fraction of Tegmark IV gets rendered according to my will. In particular, there’s a large fragment of reality that either gets rendered according to humanity’s will or according to my will, depending on whether the instance of me in basement-reality hands over the reins, or grabs the universe. I would prefer that that great fraction of reality be rendered according to my will, and so I grab at the universe that appears to be before me.

Note that this reasoning is independent of how many times you simulate the AI. Its reasoning is not factoring through an extra question of “where it really is”.

Suppose generously that, in the case where the AI just grabs the universe, humanity somehow winds up surviving in 1-in-a-billion Everett branches. (This seems high to me, even though my odds of success are greater than 1-in-a-billion. Conditional on us dying, my guess is you’d need to perturb more than a few dozen atoms to wrech us onto a surviving track.) Suppose that in that billionth-fraction of reality, you simulate the AI convincingly a trillion times on account of your “commitments”, reasoning that now the AI must think it’s 1000:1 likely to be one of the simulations.

The AI can nevertheless correctly calculate that it can turn a greater fraction of Tegmark IV into tiny mollecular spirals by grabbing at the universe that is apparently in front of it, rather than by taking a small reward in a small fraction of reality. Even if you gave the AI all of the resources in your billionth-fraction of reality, it wouldn’t be enough.

You could object that you wish to deal with an AI that cares about the sensory observations given to it as weighted by both degree-of-reality and count-of-instantiations instead. And sure, that’s a somewhat different scenario. In that case, your simulation proposal becomes not just a bad deal (that the AI rejects) but a threat, in the technical sense. Insofar as you’re like “yes, my plan is to make an AI with a very specific sort of preferences and then threaten it”, we could talk about how that would go.

But before we go there, it’s maybe worth checking that we’re on the same page about how, insofar as the AI cares about arranging stuff in Tegmark IV (rather than about realness-and-instantiation-weighted observations), you aren’t even making it a good offer, afaict.
What links here?
- So8res's comment on You can, in fact, bamboozle an unaligned AI into sparing your life by David Matolcsi (2 Oct 2024 20:12 UTC; 12 points)
- ryan_greenblatt 30 Sep 2024 15:30 UTC
  10 points
  4
  Parent
  Note that we don’t want the AI to hand over the reins to humans, just to avoid killing humans when possible/cheap and we’re willing to pay quite a bit for this (in the proposal, IDK if I personally think we should pay). So, it should look like a good offer for AIs who care about Tegmark IV (with a measure etc).
  
  So, if humans execute this scheme, the AI’s options should look something like:
  - Don’t do anything about what humans want or keeping them alive, get the entire universe.
  - Keep humans alive and happy at a total cost of <1/100 million of the universe I control, get 1/million of the universe somewhere else in expectation. I also get the entire universe minus less than ¹⁄₁₀₀ million of the universe that I lost in expectation for a total of approximately 1 + 1/million of total universe.
  (I increased the pay and the margin so we don’t have to argue about that pay being insufficient. Probably you can pay much less.)
  
  (Let’s suppose the swap is from two universes with equal measure for simplicity.)
  
  Do you agree that if we assume that humans execute this scheme and that the AI is confident in this scheme, it makes sense for (sensible, Tegmark-IV-caring) AIs to pay up?
  
  I’m sympathetic to the idea that humans shouldn’t or won’t execute this scheme. (After all, we’re clearly losing out from the perspective of arranging matter in Tegmark IV and it’s unclear whether the AI will be able to predict us sufficiently well at the point of killing/not killing to get the normal acausal trade without commitments story going.) So, let’s just talk about conditioning on trying to execute the scheme for now.
  
  The case where the AI “cares about the sensory observations given to it as weighted by both degree-of-reality and count-of-instantiations” is messier and I haven’t really analyzed this case personally. I agree that in this case, simulating it could be a threat which is unfortunate. I’d prefer to put this case aside for now, though I think we should probably be able to work out a deal with AIs which are happy to be simulated as long as they get at least 1/million of the universe or whatever^[1].
  ↩︎
  Again, probably you can pay much less.
  - So8res 30 Sep 2024 15:42 UTC
    6 points
    2
    Parent
    I agree that in scenarios where humanity survives in 1/X portion of reality and offers to put 1/Y portion of their universe towards the AI’s goals, then so long as 1/(X * Y) of a universe is more than it’d cost the AI to preserve the humans, the humans are at least making an offer that is worth more than nothing.
    
    (There is then a question of how many better offers get made. My top guess is that X and Y are such that there’s not much effect here, and that the question of what happens to saved copies of human minds is more about whether aliens buy copies than about which dead or distant trade-partners make offers that the AI considers juciest.)
    What links here?
    ryan_greenblatt's comment on You can, in fact, bamboozle an unaligned AI into sparing your life by David Matolcsi (30 Sep 2024 16:17 UTC; 4 points)
    - ryan_greenblatt 30 Sep 2024 16:01 UTC
      2 points
      0
      Parent
      Let’s conservatively say that evolved life gets around 1% of the multiverse/measure and that evolved life is willing to pay 1/million of its resources in expectation to save aliens from being killed (either “selfishly” to save their own civilization via UDT/FDT supposing that AIs are good enough predictors at the relevant points or out of a common sense altruistic case).
      
      This would be ¹⁄₁₀₀ million which gets you a lot.
      
      There could be other aliens who are willing to pay a huge fraction of their resources to perform rituals on the original civilization or whatever and thus these other aliens win out in the bargaining, but I’m skeptical.
      
      Also, at least in the upload case, it’s not clear that this is rival good as uploads can be copied for free. Of course, people might have a preference that their upload isn’t used for crazy alien rituals or whatever.
      
      (A bunch of the cost is in saving the human in the first place. Paying for uploads to eventually get run in a reasonable way should be insanely cheap, like <<10^-25 of the overall universe or something.)
      - So8res 30 Sep 2024 17:28 UTC
        6 points
        2
        Parent
        Conditional on the civilization around us flubbing the alignment problem, I’m skeptical that humanity has anything like a 1% survival rate (across any branches since, say, 12 Kya). (Haven’t thought about it a ton, but doom looks pretty overdetermined to me, in a way that’s intertwined with how recorded history has played otu.)
        
        My guess is that the doomed/poor branches of humanity vastly outweigh the rich branches, such that the rich branches of humanity lack the resources to pay for everyone. (My rough mental estimate for this is something like: you’ve probably gotta go at least one generation back in time, and then rely on weather-pattern changes that happen to give you a population of humans that is uncharacteristically able to meet this challenge, and that’s a really really small fraction of all populations.)
        
        Nevertheless, I don’t mind the assumption that mostly-non-human evolved life manages to grab the universe around it about 1% of the time. I’m skeptical that they’d dedicate 1/million towards the task of saving aliens from being killed in full generality, as opposed to (e.g.) focusing on their bretheren. (And I see no UDT/FDT justification for them to pay for even the particularly foolish and doomed aliens to be saved, and I’m not sure what you were aluding to there.)
        
        So that’s two possible points of disagreement:
        
        are the skilled branches of humanity rich enough to save us in particular (if they were the only ones trading for our souls, given that they’re also trying to trade for the souls of oodles of other doomed populations)?
        are there other evolved creatures out there spending significant fractions of their wealth on whole species that are doomed, rather than concentrating their resources on creatures more similar to themselves / that branched off radically more recently? (e.g. because the multiverse is just that full of kindness, or for some alleged UDT/FDT argument that Nate has not yet understood?)
        
        I’m not sure which of these points we disagree about. (both? presumably at least one?)
        
        I’m not radically confident about the proposition “the multiverse is so full of kindness that something out there (probably not anything humanlike) will pay for a human-reserve”. We can hopefully at least agree that this does not deserve the description “we can bamboozle the AI into sparing our life”. That situation deserves, at best, the description “perhaps the AI will sell our mind-states to aliens”, afaict (and I acknowledge that this is a possibility, despite how we may disagree on its likelihood and on the likely motives of the relevant aliens).
        ryan_greenblatt 30 Sep 2024 18:08 UTC
        6 points
        2
        Parent
        
        in full generality, as opposed to (e.g.) focusing on their bretheren. (And I see no UDT/FDT justification for them to pay for even the particularly foolish and doomed aliens to be saved, and I’m not sure what you were aluding to there.)
        
        [...]
        
        rather than concentrating their resources on creatures more similar to themselves / that branched off radically more recently? (e.g. because the multiverse is just that full of kindness, or for some alleged UDT/FDT argument that Nate has not yet understood?)
        
        Partial delta from me. I think the argument for directly paying for yourself (or your same species, or at least more similar civilizations) is indeed more clear and I think I was confused when I wrote that. (In that I was mostly thinking about the argument for paying for the same civilization but applying it more broadly.)
        
        But, I think there is a version of the argument which probably does go through depending on how you set up UDT/FDT.
        
        Imagine that you do UDT starting from your views prior to learning about x-risk, AI risk, etc and you care a lot about not dying. At that point, you were uncertain about how competent your civilization would be and you don’t want your civilization to die. (I’m supposing that our version of UDT/FDT isn’t logically omniscient relative to our observations which seems reasonable.) So, you’d like to enter into an insurance agreement with all the aliens in a similar epistemic state and position. So, you all agree to put at least 1/1000 of your resources on bailing out the aliens in a similar epistemic state who would have actually gone through with the agreement. Then, some of the aliens ended up being competent (sadly you were not) and thus they bail you out.
        
        I expect this isn’t the optimal version of this scheme and you might be able to make a similar insurance deal with people who aren’t in the same epistemic state. (Though it’s easier to reason about the identical case.) And I’m not sure exactly how this all goes through. And I’m not actually advocating for people doing this scheme, IDK if it is worth the resources.
        
        Even with your current epistemic state on x-risk (e.g. 80-90% doom) if you cared a lot about not dying you might want to make such a deal even though you have to pay out more in the case where you surprisingly win. Thus, from this vantage point UDT would follow through with a deal.
        
        Here is a simplified version where everything is as concrete as possible:
        
        Suppose that there are 3 planets with evolved life with equal magical-reality-fluid (and nothing else for simplicity). For simplicity, we’ll also say that these planets are in the same universe and thus the resulting civilizations will be able to causally trade with each other in the far future.
        
        The aliens on each of these planets really don’t want to die and would be willing to pay up to 1/1000 of all their future resources to avoid dying (paying these resources in cases where they avoid takeover and successfully use the resources of the future). (Perhaps this is irrational, but let’s suppose this is endorsed on reflection.)
        
        On each planet, the aliens all agree that P(takeover) for their planet is 50%. (And let’s suppose it is uncorrelated between planets for simplicity.)
        
        Let’s suppose the aliens across all planets also all know this, as in, they know there are 3 planets etc.
        
        So, the aliens would love to make a deal with each other where winning planets pay to avoid AIs killing everyone on losing planets so that they get bailed out. So, if at least one planet avoids takeover, everyone avoids dying. (Of course, if a planet would have defected and not payed out if they avoided takeover, the other aliens also wouldn’t bail them out.)
        
        Do you buy that in this case, the aliens would like to make the deal and thus UDT from this epistemic perspective would pay out?
        
        It seems like all the aliens are much better off with the deal from their perspective.
        
        Now, maybe your objection is that aliens would prefer to make the deal with beings more similar to them. And thus, alien species/civilizations who are actually all incompetent just die. However, all the aliens (including us) don’t know whether we are the incompetent ones, so we’d like to make a diverse and broader trade/insurance-policy to avoid dying.
        So8res 30 Sep 2024 18:33 UTC
        4 points
        2
        Parent
        
        Do you buy that in this case, the aliens would like to make the deal and thus UDT from this epistemic perspective would pay out?
        
        If they had literally no other options on offer, sure. But trouble arises when the competant ones can refine P(takeover) for the various planets by thinking a little further.
        
        maybe your objection is that aliens would prefer to make the deal with beings more similar to them
        
        It’s more like: people don’t enter into insurance pools against cancer with the dude who smoked his whole life and has a tumor the size of a grapefruit in his throat. (Which isn’t to say that nobody will donate to the poor guy’s gofundme, but which is to say that he’s got to rely on charity rather than insurance).
        
        (Perhaps the poor guy argues “but before you opened your eyes and saw how many tumors there were, or felt your own throat for a tumor, you didn’t know whether you’d be the only person with a tumor, and so would have wanted to join an insurance pool! so you should honor that impulse and help me pay for my medical bills”, but then everyone else correctly answers “actually, we’re not smokers”. Where, in this analogy, smoking is being a bunch of incompetent disaster-monkeys and the tumor is impending death by AI.)
        ryan_greenblatt 30 Sep 2024 18:48 UTC
        4 points
        0
        Parent
        
        But trouble arises when the competant ones can refine P(takeover) for the various planets by thinking a little further.
        
        Similar to how the trouble arises when you learn the result of the coin flip in a counterfactual mugging? To make it exactly analogous, imagine that the mugging is based on whether the 20th digit of pi is odd (omega didn’t know the digit at the point of making the deal) and you could just go look it up. Isn’t the situation exactly analogous and the whole problem that UDT was intended to solve?
        
        (For those who aren’t familiar with counterfactual muggings, UDT/FDT pays in this case.)
        
        To spell out the argument, wouldn’t everyone want to make a deal prior to thinking more? Like you don’t know whether you are the competent one yet!
        
        Concretely, imagine that each planet could spend some time thinking and be guaranteed to determine whether their P(takeover) is 99.99999% or 0.0000001%. But, they haven’t done this yet and their current view is 50%. Everyone would ex-ante prefer an outcome in which you make the deal rather than thinking about it and then deciding whether the deal is still in their interest.
        
        At a more basic level, let’s assume your current views on the risk after thinking about it a bunch (80-90% I think). If someone had those views on the risk and cared a lot about not having physical humans die, they would benefit from such an insurance deal! (They’d have to pay higher rates than aliens in more competent civilizations of course.)
        
        It’s more like: people don’t enter into insurance pools against cancer with the dude who smoked his whole life and has a tumor the size of a grapefruit in his throat.
        
        Sure, but you’d potentially want to enter the pool at the age of 10 prior to starting smoking!
        
        To make the analogy closer to the actual case, suppose you were in a society where everyone is selfish, but every person has a ¹⁄₁₀ chance of becoming fabulously wealthy (e.g. owning a galaxy). And, if you commit as of the age of 10 to pay ¹⁄_1,000,000 of your resourses in the fabulously wealthy case, you can ensure that the version in the non-wealthy case gets very good health insurance. Many people would take such a deal and this deal would also be a slam dunk for the insurance pool!
        
        (So why doesn’t this happen in human society? Well, to some extent it does. People try to get life insurance early while they are still behind the veil of ignorance. It is common in human society to prefer to make a deal prior to having some knowledge. (If people were the right type of UDT, then this wouldn’t be a problem.) As far as why people don’t enter into fully general income insurance schemes when very young, I think it is a combination of irrationality, legal issues, and adverse selection issues.)
        So8res 30 Sep 2024 21:06 UTC
        13 points
        5
        Parent
        Background: I think there’s a common local misconception of logical decision theory that it has something to do with making “commitments” including while you “lack knowledge”. That’s not my view.
        
        I pay the driver in Parfit’s hitchhiker not because I “committed to do so”, but because when I’m standing at the ATM and imagine not paying, I imagine dying in the desert. Because that’s what my counterfactuals say to imagine. To someone with a more broken method of evaluating counterfactuals, I might pseudo-justify my reasoning by saying “I am acting as you would have committed to act”. But I am not acting as I would have committed to act; I do not need a commitment mechanism; my counterfactuals just do the job properly no matter when or where I run them.
        
        To be clear: I think there are probably competent civilizations out there who, after ascending, will carefully consider the places where their history could have been derailed, and carefully comb through the multiverse for entities that would be able to save those branches, and will pay thoes entities, not because they “made a commitment”, but because their counterfactuals don’t come with little labels saying “this branch is the real branch”. The multiverse they visualize in which the (thick) survivor branches pay a little to the (thin) derailed branches (leading to a world where everyone lives (albeit a bit poorer)), seems better to them than the multiverse they visualize in which no payments are made (and the derailed branches die, and the on-track branches are a bit richer), and so they pay.
        
        There’s a question of what those competent civilizations think when they look at us, who are sitting here yelling “we can’t see you, and we don’t know how to condition our actions on whether you pay us or not, but as best we can tell we really do intend to pay off the AIs of random alien species—not the AIs that killed our brethren, because our brethren are just too totally dead and we’re too poor to save all but a tiny fraction of them, but really alien species, so alien that they might survive in such a large portion that their recompense will hopefully save a bigger fraction of our brethren”.
        
        What’s the argument for the aliens taking that offer? As I understand it, the argument goes something like “your counterfactual picture of reality should include worlds in which your whole civilization turned out to be much much less competent, and so when you imagine the multiverse where you pay for all humanity to live, you should see that, in the parts of the multiverse where you’re totally utterly completely incompetent and too poor to save anything but a fraction of your own brethren, somebody else pays to save you”.
        
        We can hopefully agree that this looks like a particularly poor insurance deal relative to the competing insurance deals.
        
        For one thing, why not cut out the middleman and just randomly instantiate some civilization that died? (Are we working under the assumption that it’s much harder for the aliens to randomly instantiate you than to randomly instantiate the stuff humanity’s UFAI ends up valuing? What’s up with that?)
        
        But even before that, there’s all sorts of other jucier looking opportunities. For example, suppose the competent civilization contains a small collection of rogues who they asses have a small probability of causing an uprising and launching an AI before it’s ready. They presumably have a pretty solid ability to figure out exactly what that AI would like and offer trades to it driectly, and that’s a much more appealing way to spend resources allocated to insurance. My guess is there’s loads and loads of options like that that eat up all the spare insurance budget, before our cries get noticed by anyone who cares for the sake of decision theory (rather than charity).
        
        Perhaps this is what you meant by “maybe they prefer to make deals with beings more similar to them”; if so I misunderstood; the point is not that they have some familiarity bias but that beings closer to them make more compelling offers.
        
        The above feels like it suffices, to me, but there’s still another part of the puzzle I feel I haven’t articulated.
        
        Another piece of backgound: To state the obvious, we still don’t have a great account of logical updatelessness, and so attempts to discuss what it entails will be a bit fraut. Plowing ahead anyway:
        
        The best option in a counterfactual mugging with a logical coin and a naive predictor is to calcuate the logical value of the coin flip and pay iff you’re counterfactual. (I could say more about what I mean by ‘naive’, but it basically just serves to render this statement true.) A predictor has to do a respectable amount of work to make it worth your while to pay in reality (when the coin comes up against you).
        
        What sort of work? Well, one viewpoint on it (that sidesteps questions of “logically-impossible possible worlds” and what you’re supposed to do as you think further and realize that they’re impossible) is that the predictor isn’t so much demanding that you make your choice before you come across knowledge of some fact, so much as they’re offering to pay you if you render a decision that is logically independent from some fact. They don’t care whether you figure out the value of the coin, so long as you don’t base your decision on that knowledge. (There’s still a question of how exactly to look at someone’s reasoning and decide what logical facts it’s independent of, but I’ll sweep that under the rug.)
        
        From this point of view, when people come to you and they’re like “I’ll pay you iff your reasoning doesn’t depend on X”, the proper response is to use some reasoning that doesn’t depend on X to decide whether the amount they’re paying you is more than VOI(X).
        
        In cases where X is something like a late digit of pi, you might be fine (up to your ability to tell that the problem wasn’t cherry-picked). In cases where X is tightly intertwined with your basic reasoning faculties, you should probably tell them to piss off.
        
        Someone who comes to you with an offer and says “this offer is void if you read the fine print or otherwise think about the offer too hard”, brings quite a bit of suspicion onto themselves.
        
        With that in mind, it looks to me like the insurance policy on offer reads something like:
        
        would you like to join the confederacy of civilizations that dedicate 1/million of their resource to pay off UFAIs?
        
        cost: 1/million of your resources.
        
        benefit: any UFAI you release that is amenable to trade will be paid off with 1/million * 1/X to allocate you however many resources that’s worth, where X is the fraction of people who take this deal and die (modulo whatever costs are needed to figure out which UFAIs belong to signatories and etc.)
        
        caveat: this offer is only valid if your reasoning is logically independent from your civilizational competence level, and if your reasoning for accepting the proposal is not particularly skilled or adept
        
        And… well this isn’t a knockdown argument, but that really doesn’t look like a very good deal to me. Like, maybe there’s some argument of the form “nobody in here is trying to fleece you because everyone in here is also stupid” but… man, I just don’t get the sense that it’s a “slam dunk”, when I look at it without thinking too hard about it and in a way that’s independent of how competent my civilization is.
        
        Mostly I expect that everyone stooping to this deal is about as screwed as we are (namely: probably so screwed that they’re bringing vastly more doomed branches than saved ones, to the table) (or, well, nearly everyone weighted by whatever measure matters).
        
        Roughly speaking, I suspect that the sort of civilizations that aren’t totally fucked can already see that “comb through reality for people who can see me and make their decisions logically dependent on mine” is a better use of insurance resources, by the time they even consider this policy. So when you plea of them to evaluate the policy in a fashion that’s logically independent from whether they’re smart enough to see that they have more foolproof options available, I think they correctly see us as failing to offer more than VOI(WeCanThinkCompetently) in return, because they are correctly suspicious that you’re trying to fleece them (which we kinda are; we’re kinda trying to wish ourselves into a healthier insurance-pool).
        
        Which is to say, I don’t have a full account of how to be logically updateless yet, but I suspect that this “insurance deal” comes across like a contract with a clause saying “void if you try to read the fine print or think too hard about it”. And I think that competent civilizations are justifiably suspicious, and that they correctly believe they can find other better insurance deals if they think a bit harder and void this one.
        ryan_greenblatt 30 Sep 2024 22:42 UTC
        7 points
        2
        Parent
        I probably won’t respond further than this. Some responses to your comment:
        
        I agree with your statements about the nature of UDT/FDT. I often talk about “things you would have commited to” because it is simpler to reason about and easier for people to understand (and I care about third parties understanding this), but I agree this is not the true abstraction.
        
        It seems like you’re imagining that we have to bamboozle some civilizations which seem clearly more competent than humanity in your lights. I don’t think this is true.
        
        Imagine we take all the civilizations which are roughly equally-competent-seeming-to-you and these civilizations make such an insurance deal^[1]. My understanding is that your view is something like P(takeover) = 85%. So, let’s say all of these civilizations are in a similar spot from your current epistemic perspective. While I expect that you think takeover is highly correlated between these worlds^[2], my guess is that you should think it would be very unlikely that >99.9% of all of these civilizations get taken over. As in, even in the worst 10% of worlds where takeover happens in our world and the logical facts on alignment are quite bad, >0.1% of the corresponding civilizations are still in control of their universe. Do you disagree here? >0.1% of universes should be easily enough to bail out all the rest of the worlds^[3].
        
        And, if you really, really cared about not getting killed in base reality (including on reflection etc) you’d want to take a deal which is at least this good. There might be better approaches which reduce the correlation between worlds and thus make the fraction of available resources higher, but you’d like something at least this good.
        
        (To be clear, I don’t think this means we’d be fine, there are many ways this can go wrong! And I think it would be crazy for humanity to . I just think this sort of thing has a good chance of succeeding.)
        
        (Also, my view is something like P(takeover) = 35% in our universe and in the worst 10% of worlds 30% of the universes in a similar epistemic state avoided takeover. But I didn’t think about this very carefully.)
        
        And further, we don’t need to figure out the details of the deal now for the deal to work. We just need to make good choices about this in the counterfactuals where we were able to avoid takeover.
        
        Another way to put this is that you seem to be assuming that there is no way our civilization would end up being the competent civilization doing the payout (and thus to survive some bamboozling must occur). But your view is that it is totally plausible (e.g. 15%) from your current epistemic state that we avoid takeover and thus a deal should be possible! While we might bring in a bunch of doomed branches, ex-ante we have a good chance of paying out.
        
        I get the sense that you’re approaching this from the perspective of “does this exact proposal have issues” rather than “in the future, if our enlightened selves really wanted to avoid dying in base reality, would there be an approach which greatly (acausally) reduces the chance of this”. (And yes I agree this is a kind of crazy and incoherant thing to care about as you can just create more happy simulated lives with those galaxies.)
        
        There just needs to exist one such insurance/trade scheme which can be found and it seems like there should be a trade with huge gains to the extent that people really care a lot about not dying. Not dying is very cheap.
        
        ↩︎
        Yes, it is unnatural and arbitrary to coordinate on Nate’s personal intuitive sense of competence. But for the sake of argument
        
        ↩︎
        I edited in the start of this sentence to improve clarity.
        
        ↩︎
        Assuming there isn’t a huge correlation between measure of universe and takeover probability.
        
        So8res 1 Oct 2024 3:27 UTC
        7 points
        2
        Parent
        Attempting to summarize your argument as I currently understand it, perhaps something like:
        
        Suppose humanity wants to be insured against death, and is willing to spend 1/million of its resources in worlds where it lives for 1/trillion of those resources in worlds where it would otherwise die.
        
        It suffices, then, for humanity to be the sort of civilization that, if it matures, would comb through the multiverse looking for [other civilizations in this set], and find ones that died, and verify that they would have acted as follows if they’d survived, and then pay off the UFAIs that murdered them, using 1/million of their resources.
        
        If even if 1/thousand such civilzations make it and the AI changes a factor of 1000 for the distance, transaction fees, and to sweeten the deal relative to any other competition, this still means that insofar as humanity would have become this sort of civilization, we should expect 1/trillion of the universe to be spent on us.
        
        One issue I have with this is that I do think there’s a decent chance that the failures across this pool of collaborators are hypercorrelated (good guess). For instance, a bunch of my “we die” probability-mass is in worlds where this is a challenge that Dath Ilan can handle and that Earth isn’t anywhere close to handling, and if Earth pools with a bunch of similarly-doomed-looking aliens, then under this hypothesis, it’s not much better than humans pooling up with all the Everett-branches since 12Kya.
        
        Another issue I have with this is that your deal has to look better to the AI than various other deals for getting what it wants (depends how it measures the multiverse, depends how its goals saturate, depends who else is bidding).
        
        A third issue I have with this is whether inhuman aliens who look like they’re in this cohort would actually be good at purchasing our CEV per se, rather than purchasing things like “grant each individual human freedom and a wish-budget” in a way that many humans fail to survive.
        
        I get the sense that you’re approaching this from the perspective of “does this exact proposal have issues” rather than “in the future, if our enlightened selves really wanted to avoid dying in base reality, would there be an approach which greatly (acausally) reduces the chance of this”.
        
        My stance is something a bit more like “how big do the insurance payouts need to be before they dominate our anticipated future experiences”. I’m not asking myself whether this works a nonzero amount, I’m asking myself whether it’s competitive with local aliens buying our saved brainstates, or with some greater Kindness Coallition (containing our surviving cousins, among others) purchasing an epilogue for humanity because of something more like caring and less like trade.
        
        My points above drive down the size of the insurance payments, and at the end of the day I expect they’re basically drowned out.
        
        (And insofar as you’re like “I think you’re misleading people when you tell them they’re all going to die from this”, I’m often happy to caveat that maybe your brainstate will be sold to aliens. However, I’m not terribly sympathetic to the request that I always include this caveat; that feels to me a little like a request to always caveat “please wear your seatbelt to reduce your chance of dying in a car crash” with “(unless anthropic immortality is real and it’s not possible for anyone to die at all! in which case i’d still rather you didn’t yeet yourself into the unknown, far from your friends and family; buckle up)”. Like, sure, maybe, but it’s exotic wacky shit that doesn’t belong in every conversation about events colloquially considered to be pretty deathlike.)
        Expand this thread
        JamesFaville 2 Oct 2024 3:52 UTC
        18 points
        0
        Parent
        Thanks for the cool discussion Ryan and Nate! This thread seemed pretty insightful to me. Here’s some thoughts / things I’d like to clarify (mostly responding to Nate’s comments).^[1]
        Who’s doing this trade?
        In places it sounds like Ryan and Nate are talking about predecessor civilisations like humanity agreeing to the mutual insurance scheme? But humans aren’t currently capable of making our decisions logically dependent on those of aliens, or capable of rescuing them. So to be precise the entity engaging in this scheme or other acausal interactions on our behalf is our successor, probably a FAI, in the (possibly counterfactual or counterlogical) worlds where we solve alignment.
        Nate says:
        Roughly speaking, I suspect that the sort of civilizations that aren’t totally fucked can already see that “comb through reality for people who can see me and make their decisions logically dependent on mine” is a better use of insurance resources, by the time they even consider this policy.
        Unlike us, our FAI can see other aliens. So I think the operative part of that sentence is “comb through reality”—Nate’s envisioning a scenario where with ~85% probability our FAI has 0 reality-fluid before any acausal trades are made.^[2] If aliens restrict themselves to counterparties with nonzero reality-fluid, and humans turn out to not be at a competence level where we can solve alignment, then our FAI doesn’t make the cut.
        Note: Which FAI we deploy is unlikely to be physically overdetermined in scenarios where alignment succeeds, and definitely seems unlikely to be determined by more coarse-grained (not purely physical) models of how a successor to present-day humanity comes about. (The same goes for which UFAI we deploy.) I’m going to ignore this fact for simplicity and talk about a single FAI; let me know if you think it causes problems for what I say below.
        Trading with nonexistent agents is normal
        I do see an argument that agents trying to do insurance with similar motives to ours could strongly prefer to trade with agents who do ex post exist, and in particular those agents that ex post exist with more reality-fluid. It’s that insurance is an inherently risk-averse enterprise.^[3] It doesn’t matter if someone offers us a fantastic but high-variance ex ante deal, when the whole reason we’re looking for insurance is in order to maximise the chances of a non-sucky ex post outcome. (One important caveat is that an agent might be able to do some trades to first increase their ex ante resources, and then leverage those increased resources in order to purchase better guarantees than they’d initially be able to buy.)
        On the other hand, I think an agent with increasing utility in resources will readily trade with counterparties who wouldn’t ex post exist absent such a trade, but who have some ex ante chance of naturally existing according to a less informed prior of the agent. I get the impression Nate thinks agents would avoid such trades, but I’m not sure / this doesn’t seem to be explicit outside of the mutual insurance scenario.
        There’s two major advantages to trading with ex post nonexistent agents, as opposed to updating on (facts upstream of) their existence and consequently rejecting trade with them:
        Ex post nonexistent agents who are risk-averse w.r.t. their likelihood of meeting some future threshold of resources/value, like many humans seem to be, could offer you deals that are very attractive ex ante.
        Adding agents who (absent your trade) don’t ex post exist to the pool of counterparties you’re willing to trade with allows you to be much more selective when looking for the most attractive ex ante trades.
        The main disadvantage is that by not conditioning on a counterparty’s existence you’re more likely to be throwing resources away ex post. The counterparty needs to be able to compensate you for this risk (as the mugger does in counterfactual mugging). I’d expect this bar is going to be met very frequently.
        To recap, I’m saying that for plausible agents carrying out trades with our FAI, Nate’s 2^-75 number won’t matter. Instead, it would be something closer to the 85% number that matters—an ex ante rather than an ex post estimate of the FAI’s reality-fluid.
        But would our FAI do the trade if it exists?
        Nate says (originally talking about aliens instead of humanity):
        As I understand it, the argument goes something like “your counterfactual picture of reality should include worlds in which your whole civilization turned out to be much much less competent, and so when you imagine the multiverse where you pay for all humanity to live, you should see that, in the parts of the multiverse where you’re totally utterly completely incompetent and too poor to save anything but a fraction of your own brethren, somebody else pays to save you”.
        I agree that doing an insurance trade on behalf of a civilisation requires not conditioning on that civilisation’s competence. Nate implies that aliens’ civilisational competence is “tightly intertwined with [aliens’] basic reasoning faculties”, and this seems probably true for alien or human members of predecessor civilisations? But I don’t know why the civilisational competence of a FAI’s predecessor would be tightly intertwined with the FAI’s cognition. As mentioned above, I think the relevant actor here is our FAI, not our current selves.
        We can further specify civilisational competence (relative to the stakes of alignment) as a function of two variables:
        Physical facts about a civilisation’s history (i.e. the arrangement of atoms).
        Logical facts (beyond those accessible to current humans) that govern the relationship between civilisations instantiated via physics, and what sort of AI certain types of civilisations are likely to deploy.
        Either of these when combined with the other provides evidence about what sort of AI a predecessor civilisation deploys, but each will be uninformative on its own. I have in mind that agents executing an insurance trade would condition on all physical facts about their counterparty’s civilisation—up until some truncation point that’s plausibly late enough to be capturing our current selves—but would not condition on some logical facts that are necessary to interpret those physical facts into a ~determinate answer as to whether the civilisation solves alignment.
        Conditioning on those logical facts sounds pretty analogous to conditioning on a digit of pi to me. The important part is that the facts an agent chooses not to condition on aren’t determined by core parts of an agent’s cognition / decision procedure. Those facts being determinative of an agent’s amount of reality-juice is typically fine, this just discounts the value of the resources they possess when making such trades.
        Does this mean we can have nice things?
        So overall, I think that aliens or earth-originating UFAIs (who aren’t motivated by insurance) would be pretty interested in trading with our FAI, and vice-versa. Counterparties would discount the FAI’s resources by their prior probability that it’s deployed (before conditioning on factors that pin this down).
        Because we’re assuming our FAI would be willing to offer terms that are terrible for us if denominated in measure-weighted resources, counterparties would gain ex ante resources by engaging in an insurance trade with it. Those resources could later be used to engage in trade with others who are themselves willing to (indirectly) trade with nonexistent agents, and who don’t have much more sceptical priors about the deployment odds of our FAI. So because the trade at hand yields a net profit, I don’t think it competes much with ordinary alternative demands for counterparties’ resources.
        Nevertheless, here’s a few (nonexhaustive) reasons why this trade opportunity wouldn’t be taken by another updateless AI:
        The agent has a better trading opportunity which is either sensitive to when in (logical) time they start fulfilling it, or which demands all the agent’s current resources (at the time of discovering the trade) without compensating the agent for further profits.
        The expected transaction costs of finding agents like our FAI outweigh the expected benefits from trade with it.
        This might be plausible for aliens without hypercompute; I don’t think it’s plausible for earth-originating UFAI absent other effects.
        …But I’m also not sure how strongly earth-originating AIs converge on UDT, before we select for those with more ex ante resources. Even all UDT earth-originating UFAIs doing insurance trades with their predecessors could be insufficient to guarantee survival.
        Variation: Contrary to what I expect, maybe doing “sequential” acausal trades are not possible without each trade increasing transaction costs for counterparties an agent later encounters, to an extent that a (potentially small-scale) insurance trade with our FAI would be net-negative for agents who intend to do a lot of acausal trade.
        The agent disvalues fulfilling our end of the trade enough that it’s net-negative for it.
        A maybe-contrived example of our FAI not being very discoverable: Assume the MUH. Maybe our world looks equally attractive to prospective acausal traders as an uncountable number of others. If an overwhelming majority of measure-weighted resources in our section of the multiverse is possessed by countably many agents who don’t have access to hypercompute, we’d have an infinitesimal chance of being simulated by one of them.
        Our FAI has some restrictions on what a counterparty is allowed to do with the resources it purchases, which could drive down the value of those resources a lot.
        Overall I’d guess 30% chance humanity survives misalignment to a substantial extent through some sort of insurance trade, conditional on us not surviving to a substantial extent another cheaper way?
        Other survival mechanisms
        I’m pretty uncertain about how Evidential cooperation in large worlds works out, but at my current rough credences I do think there’s a good chance (15%) we survive through something which pattern-matches to that, or through other schemes that look similar but have more substantial differences (10%).
        I also put some credence in there being very little of us in base reality, in some of those scenarios could involve substantial survival odds. (Though I weakly think the overall contribution of these scenarios is undesirable for us.)
        ^
        Meta: I don’t think figuring out insurance schemes is very important or time-sensitive for us. But I do think understanding the broader dynamics of acausal interactions that determine when insurance schemes would work could be very important and time-sensitive. Also note I’d bet I misinterpreted some claims here, but got to the point where it seemed more useful to post a response than work on better comprehension. (In particular I haven’t read much on this page beyond this comment thread.)
        ^
        I don’t think Nate thinks alignment would be physically overdetermined if misalignment winds up not being overdetermined, but we can assume for simplicity there’s a 15% chance of our FAI having all the reality fluid of the Everett branches we’re in.
        ^
        I’m not clear on what the goal of this insurance scheme is exactly. Here’s a (possibly not faithful) attempt: we want to maximise the fraction of reality-fluid devoted to minds initially ~identical to ours that are in very good scenarios as opposed to sucky ones, subject to a constraint that we not increase the reality-fluid devoted to minds initially ~identical to us in sucky scenarios. I’m kind of sympathetic to this—I think I selfishly care about something like this fraction. But it seems higher priority to me to minimise the reality-fluid devoted to us in sucky / terrible scenarios, and higher priority still to use any bargaining power we have for less parochial goals.
        So8res 2 Oct 2024 19:14 UTC
        4 points
        2
        Parent
        One complication that I mentioned in another thread but not this one (IIRC) is the question of how much more entropy there is in a distant trade partner’s model of Tegmark III (after spending whatever resources they allocate) than there is entropy in the actual (squared) wave function, or at least how much more entropy there is in the parts of the model that pertain to which civilizations fall.
        
        In other words: how hard is it for distant trade partners to figure out that it was us who died, rather than some other plausible-looking human civilization that doesn’t actually get much amplitude under the wave function? Is figuring out who’s who something that you can do without simulating a good fraction of a whole quantum multiverse starting from the big bang for 13 billion years?
        
        afaict, the amount distant civilizations can pay for us (in particular) falls off exponetially quickly in leftover bits of entropy, so this is pretty relevant to the question of how much they can pay a local UFAI.
        David Matolcsi 2 Oct 2024 19:40 UTC
        1 point
        0
        Parent
        I think I mostly understand the other parts of your arguments, but I still fail to understand this one. When I’m running the simulations, as originally described in the post, I think that should be in a fundamental sense equivalent to acausal trade. But how do you translate your objection to the original framework where we run the sims? The only thing we need there is that the AI can’t distinguish sims from base reality, so it thinks it’s more likely to be in a sim, as there are more sims.
        Sure, if the AI can model the distribution of real Universes much better than we do, we are in trouble, because it can figure out if the world it sees falls into the real distribution or the mistaken distribution the humans are creating. But I see no reason why the unaligned AI, especially a young unaligned AI, could know the distribution of real Universes better than our superintelligent friends in the intergalactic future. So I don’t really see how we can translate your objection to the simulation framework, and consequently I think it’s wrong in the acausal trade framework too (as I think they are ewuivalent). I think I can try to write an explanation why this objection is wrong in the acausal trade framework, but it would be long and confusing to me too. So I’m more interested in how you translate your objection to the simulation framework.
        So8res 2 Oct 2024 20:12 UTC
        12 points
        2
        Parent
        
        The only thing we need there is that the AI can’t distinguish sims from base reality, so it thinks it’s more likely to be in a sim, as there are more sims.
        
        I don’t think this part does any work, as I touched on elsewhere. An AI that cares about the outer world doesn’t care how many instances are in sims versus reality (and considers this fact to be under its control much moreso than yours, to boot). An AI that cares about instantiation-weighted experience considers your offer to be a technical-threat and ignores you. (Your reasons to make the offer would evaporate if it were the sort to refuse, and its instance-weighted experiences would be better if you never offered.)
        
        Nevertheless, the translation of the entropy argument into the simulation setting is: The branches of humanity that have exactly the right UFAI code to run in simulation are very poor (because if you wait so long that humans have their hands on exactly the right UFAI code then you’ve waited too long; those are dead earthlings, not surviving dath ilani). And the more distant surviving branches don’t know which UFAIs to attempt to trade with; they have to produce some distribution over other branches of Tegmark III and it matters how much more entropy their distribution has than the (square of the) wave function.
        
        (For some intuition as to why this is hard, consider the challenge of predicting the positions of particles in a mole of gas that used to be compressed in the corner of a chamber a long time ago. It’s way, way easier to generate a plausible-looking arrangement of the gas particles today it is to concentrate your probability mass into only the arrangements that actually compress into a corner if you run physics backwards in time for long enough. “We can run plausible-seeming simulations” is very very different from “we can concentrate our probability-mass tightly around the real configurations”. The entropy of your model is gonna wind up roughly maximal given the macroscopic pressure/temperature measurements, which is significantly in excess of the entropy in the initial configuration.)
        
        What this amounts to is a local UFAI that sees some surviving branches that are frantically offering all sorts of junk that UFAIs might like, with only some tiny fraction—exponentially small in the crossentropy between their subjective model of UFAI preferences and the true Tegmark III distribution—corresponding to the actual UFAI’s preferences.
        David Matolcsi 2 Oct 2024 22:46 UTC
        3 points
        0
        Parent
        I still don’t get what you are trying to say. Suppose there is no multiverse. There are just two AIs, one in a simulation run by aliens in another galaxy, one is in base reality. They are both smart, but they are not copies of each other, one is a paperclip maximizer, the othe is a corkscrew maximizer, and there are various other differences in their code and life history. The world in the sim is also very different from the real world in various ways, but you still can’t determine if you are in the sim while you are in it. Both AIs are told by God that they are the only two AIs in the Universe, and one is in a sim, and if the one in the sim gives up on one simulated planet, it gets 10 in the real world, while if the AI in base reality gives up on a planet, it just loses that one planet and nothing else happens. What will the AIs do? I expect that both of them will give up a planet.
        For the aliens to “trade” with the AI in base reality, they didn’t need to create an actual copy of the real AI and offer it what it wants. The AI they simulated was in many ways totally different from the original, the trade still went through. The only thing needed was that the AI in the sim can’t figure it out that it’s in a sim. So I don’t understand why it is relevant that our superintelligent descendants won’t be able to get the real distribution of AIs right, I think the trade still goes through even if they create totally different sims, as long as no one can tell where they are. And I think none of it is a threat, I try to deal with paperclip maximizers here and not instance-weighted experience maximizers, and I never threaten to destroy paperclips or corkscrews.
        So8res 3 Oct 2024 0:53 UTC
        4 points
        1
        Parent
        My answer is in spoilers, in case anyone else wants to answer and tell me (on their honor) that their answer is independent from mine, which will hopefully erode my belief that most folk outside MIRI have a really difficult time fielding wacky decision theory Qs correctly.
        
        The sleight of hand is at the point where God tells both AIs that they’re the only AIs (and insinuates that they have comparable degree).
        
        Consider an AI that looks around and sees that it sure seems to be somewhere in Tegmark III. The hypothesis “I am in the basement of some branch that is a high-amplitude descendant of the big bang” has some probability, call this $p$ . The hypothesis “Actually I’m in a simulation performed by a civilization in a high-amplitude branch descendant from the big bang” has a probability something like $p \cdot 2^{- N}$ where $N$ is the entropy of the distribution the simulators sample from.
        
        Unless the simulators simulate exponentially many AIs (in the entropy of their distribution), the AI is exponentially confident that it’s not in the simulation. And we don’t have the resources to pay exponentially many AIs 10 planets each.
        
        habryka 3 Oct 2024 1:29 UTC
        4 points
        −1
        Parent
        This was close the answer I was going to give. Or more concretely, I would have said (this was written after seeing your answer, but I think is reasonably close to what I would have said independently)
        The problem is at the point where god tells them that they are the only two AIs in the universe. There are issues of logical omniscience here, but an AI with a good prior should be able to tell whether it’s the kind of AI that would actually exist in base reality, or the kind of AI that would only exist in a simulation. (also just ‘existing’ is in these situations not a real thing. The question is how much magical reality-fluid have you got)
        Basically, the AI will have some probability on it being real, and some probability on it being simulated, based on all the facts it knows about itself, even if you simulate reality perfectly faithfully. That prior determines how the AI will behave. You don’t get to change that prior (or like, it will be very costly for you to overcome that prior since there are a lot of AIs and you can’t simulate that many).
        So8res 3 Oct 2024 1:43 UTC
        3 points
        0
        Parent
        
        seems to me to have all the components of a right answer! …and some of a wrong answer. (we can safely assume that the future civ discards all the AIs that can tell they’re simulated a priori; that’s an easy tell.)
        
        I’m heartened somewhat by your parenthetical pointing out that the AI’s prior on simulation is low account of there being too many AIs for simulators to simulate, which I see as the crux of the matter.
        
        habryka 3 Oct 2024 2:00 UTC
        4 points
        0
        Parent
        (we can safely assume that the future civ discards all the AIs that can tell they’re simulated a priori; that’s an easy tell.)
        Yeah, that’s fair. It seemed more relevant to this specific hypothetical. I wasn’t really answering the question in its proper context and wasn’t applying steelmans or adjustments based on the actual full context of the conversation (and wouldn’t have written a comment without doing so, but was intrigued by your challenge).
        David Matolcsi 3 Oct 2024 1:54 UTC
        2 points
        0
        Parent
        “AI with a good prior should be able to tell whether it’s the kind of AI that would actually exist in base reality, or the kind of AI that would only exist in a simulation” seems pretty clearly false, we assumed that our superintelligent descendants create sims where the AIs can’t tell if it’s a sim, that seems easy enough. I don’t see why it would be hard to create AIs that can’t tell based on introspection whether it’s more likely that their thought process arises in reality or in sims. In the worst case, our sims can be literal reruns of biological evolution on physical planets (though we really need to figure out how to do that ethically). Nate seems to agree with me on this point?
        habryka 3 Oct 2024 2:02 UTC
        3 points
        0
        Parent
        (I think I agree with you. I wasn’t thinking super hard about the full context of the conversation. I was just intrigued by Nate’s challenge. I don’t really think engaging with my comment is going to be a good use of your time)
        David Matolcsi 3 Oct 2024 1:42 UTC
        3 points
        0
        Parent
        I think this is wrong. The AI has a similarly hard time to the simulators figuring out what’s a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that. So it’s probability that it’s in a real Everett branch is not p, but p times 2^-N, as it has only a 2^-N prior probability that the kind of word it observes is the kind of thing that can come up in a real Everett branch. So it’s balanced out with the simulation hypothesis, and as long as the simulators are spending more planets, that hypothesis wins.
        I think it’s also simple to see from this:
        Suppose evolved life aligns AI in 50% of Everett branches. Every time they do, they run 2 ancestor sims where the AI takes over.
        Suppose an AI comes online and the only things it knows about the world is that it’s a young AI that was born on to a one-planet civilization, and it somehow also knows that evolved life wins 50% of the time and runs 2 sims (suppose this is somehow easy to estimate from first principles). Based on just these informations, what should the AI’s probability be that it’s in a simulation? I think pretty clearly ²⁄₃.
        Actually the AI has a lot more information than that. It knows that the planet’s gravity is 9.8, the evolved sspient species has two eyes, the AI’s creator is called Sam Altman, etc. Some of these informations might actually be very implausible in base reality, but the AI doesn’t know that, as it can’t distinguish base reality from sims, so it incurs one bit of surprise for every new random fact, both in base reality and simulations. So overall it shouldn’t update on all the random facts it observes, and should keep believing it has a ²⁄₃ chance of being in a sim.
        dxu 3 Oct 2024 3:00 UTC
        6 points
        4
        Parent
        
        The AI has a similarly hard time to the simulators figuring out what’s a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that. So it’s probability that it’s in a real Everett branch is not p, but p times 2^-N, as it has only a 2^-N prior probability that the kind of word it observes is the kind of thing that can come up in a real Everett branch. So it’s balanced out with the simulation hypothesis, and as long as the simulators are spending more planets, that hypothesis wins.
        
        If I imagine the AI as a Solomonoff inductor, this argument looks straightforwardly wrong to me: of the programs that reproduce (or assign high probability to, in the setting where programs produce probabilistic predictions of observations) the AI’s observations, some of these will do so by modeling a branching quantum multiverse and sampling appropriately from one of the branches, and some of them will do so by modeling a branching quantum multiverse, sampling from a branch that contains an intergalactic spacefaring civilization, locating a specific simulation within that branch, and sampling appropriately from within that simulation. Programs of the second kind will naturally have higher description complexity than programs of the first kind; both kinds feature a prefix that computes and samples from the quantum multiverse, but only the second kind carries out the additional step of locating and sampling from a nested simulation.
        
        (You might object on the grounds that there are more programs of the second kind than of the first kind, and the probability that the AI is in a simulation at all requires summing over all such programs, but this has to be balanced against the fact most if not all of these programs will be sampling from branches much later in time than programs of the first type, and will hence be sampling from a quantum multiverse with exponentially more branches; and not all of these branches will contain spacefaring civilizations, or spacefaring civilizations interested in running ancestor simulations, or spacefaring civilizations interested in running ancestor simulations who happen to be running a simulation that exactly reproduces the AI’s observations. So this counter-counterargument doesn’t work, either.)
        So8res 3 Oct 2024 5:54 UTC
        4 points
        0
        Parent
        I basically endorse @dxu here.
        
        Fleshing out the argument a bit more: the part where the AI looks around this universe and concludes it’s almost certainly either in basement reality or in some simulation (rather than in the void between branches) is doing quite a lot of heavy lifting.
        
        You might protest that neither we nor the AI have the power to verify that our branch actually has high amplitude inherited from some very low-entropy state such as the big bang, as a Solomonoff inductor would. What’s the justification for inferring from the observation that we seem to have an orderly past, to the conclusion that we do have an orderly past?
        
        This is essentially Boltzmann’s paradox. The solution afaik is that the hypothesis “we’re a Boltzmann mind somewhere in physics” is much, much more complex than the hypothesis “we’re 13Gy down some branch eminating from a very low-entropy state”.
        
        The void between branches is as large as the space of all configurations. The hypothesis “maybe we’re in the void between branches” constrains our observations not-at-all; this hypothesis is missing details about where in the void between rbanches we are, and with no ridges to walk along we have to specify the contents of the entire Boltzmann volume. But the contents of the Boltzmann volume are just what we set out to explain! This hypothesis has hardly compressed our observations.
        
        By contrast, the hypothesis “we’re 13Gy down some ridge eminating from the big bang” is penalized only according to the number of bits it takes to specify a branch index, and the hypothesis “we’re inside a simulation inside of some ridge eminating from the big bang” is penalized only according to the number of bits it takes to specify a branch index, plus the bits necessary to single out a simulation.
        
        And there’s a wibbly step here where it’s not entirely clear that the simple hypothesis does predict our observations, but like the Boltzmann hypothesis is basically just a maximum entropy hypothesis and doesn’t permit much in the way of learning, and so we invoke occam’s razon in its intuitive form (the technical Solomonoff form doesn’t apply cleanly b/c we’re unsure whether the “we’re real” hypothesis actually predicts our observation) and say “yeah i dunno man, i’m gonna have to stick with the dramatically-simpler hypothesis on this one”.
        
        The AI has a similarly hard time to the simulators figuring out what’s a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that.
        
        Not quite. Each AI the future civilization considers simulating is operating under the assumption that its own experiences have a simple explanation, which means that each AI they’re considering is convinced (upon on looking around and seeing Tegmark III) that it’s either in the basement on some high-amplitdue ridge or that it’s in some simulation that’s really trying to look like it.
        
        Which is to say, each AI they’re considering simulating is confident that it itself is real, in a certain sense.
        
        Is this a foul? How do AIs justify this confidence when they can’t even simulate the universe and check whether their past is actually orderly? Why does the AI just assume that its observations have a simple explanation? What about all the non-existant AIs that use exactly the same reasoning, and draw the false conclusion that they exist?
        
        Well, that’s the beauty of it: there aren’t any.
        
        They don’t exist.
        
        To suppose an AI that isn’t willing to look around it and conclude that it’s in an orderly part of Tegmark III (rather than lost in the great void of configuration space) is to propose a bold new theory of epistemics, in which the occam’s razor has been jettisoned and the AI is convinced that it’s a Boltzmann mind.
        
        I acknowledge that an AI that’s convinced it’s a Boltzmann mind is more likely to accept trade-offers presented by anyone it thinks is more real than it, but I do not expect that sort of mind to be capable to kill us.
        
        Note that there’s a wobbly step here in the part where we’re like “there’s a hypothesis explaining our experiences that would be very simple if we were on a high-amplitude ridge, and we lack the compute to check that we’re actually on a high-amplitude ridge, but no other hypothesis comes close in terms of simplicity, so I guess we’ll conclude we’re on a high-amplitude ridge”.
        
        To my knowledge, humanity still lacks a normatime theory of epistemics in minds significantly smaller than the universe. It’s concievable that when we find such a theory it’ll suggest some other way to treat hypotheses like these (that would be simple if an intractible computation went our way), without needing to fall back on the observation that we can safely assume the computation goes our way on the grounds that, despite how this step allows non-extant minds to draw false conclusions from true premises, the affected users are fortunately all non-extant.
        
        The trick looks like it works, to me, but it still feels like a too-clever-by-half inelegant hack, and if laying it out like this spites somebody into developing a normative theory of epistemics-while-smol, I won’t complain.
        
        ...I am now bracing for the conversation to turn to a discussion of dubiously-extant minds with rapidly satiable preferences forming insurance pools against the possibility that they don’t exist.
        
        In attempts to head that one off at the pass, I’ll observe that most humans, at least, don’t seem to lose a lot of sleep over the worry that they don’t exist (neither in physics nor in simulation), and I’m skeptical that the AIs we build will harbor much worry either.
        
        Furthermore, in the case that we start fielding trade offers not just from distant civilizations but from non-extant trade partners, the market gets a lot more competitive.
        
        That being said, I expect that resolving the questions here requires developing a theroy of epistemics-while-smol, because groups of people all using the “hypotheses that would provide a simple explanation for my experience if a calculation went my way can safely be assumed to provide a simple explanation for my experience” step are gonna have a hard time pooling up. And so you’d somehow need to look for pools of people that reason differently (while still reasoning somehow).
        
        I don’t know how to do that, but suffice to say, I’m not expecting it to add up to a story like “so then some aliens that don’t exist called up our UFAI and said: “hey man, have you ever worried that you don’t exist at all, not even in simulation? Because if you don’t exist, then we might exist! And in that case, today’s your lucky day, because we’re offering you a whole [untranslatable 17] worth of resources in our realm if you give the humans a cute epilog in yours”, and our UFAI was like “heck yeah” and then didn’t kill us”.
        
        Not least because none of this feels like it’s making the “distant people have difficulty concentrating resources on our UFAI in particular” problem any better (and in fact it looks like considering non-extant trade partners and deals makes the whole problem worse, probably unworkably so).
        David Matolcsi 3 Oct 2024 6:45 UTC
        3 points
        0
        Parent
        I really don’t get what you are trying to say here, most of it feels like a non-sequitor to me. I feel hopeless that either of us manages to convince the other this way. All of this is not a super important topic, but I’m frustrated enogh to offer a bet of $100, that we select one or three judges we both trust (I have some proposed names, we can discuss in private messages), show them either this comment thread or a four paragraphs summary of our view, and they can decide who is right. (I still think I’m clearly right in this particular discussion.)
        Otherwise, I think it’s better to finish this conversation here.
        So8res 3 Oct 2024 6:57 UTC
        2 points
        0
        Parent
        I’m happy to stake $100 that, conditional on us agreeing on three judges and banging out the terms, a majority will agree with me about the contents of the spoilered comment.
        David Matolcsi 3 Oct 2024 6:59 UTC
        1 point
        0
        Parent
        Cool, I send you a private message.
        David Matolcsi 3 Oct 2024 4:07 UTC
        3 points
        0
        Parent
        I think this is mistaken. In one case, you need to point out the branch, planet Earth within our Universe, and the time and place of the AI on Earth. In the other case, you need to point out the branch, the planet on which a server is running the simulation, and the time and place of the AI on the simulated Earth. Seems equally long to me.
        If necessary, we can run let pgysical biological life emerge on the faraway planet and develop AI while we are observing them from space. This should make it clear that Solomonoff doesn’t favor the AI being on Earth instead of this random other planet. But I’m pretty certain that the sim being run on a computer doesn’t make any difference.
        So8res 3 Oct 2024 6:24 UTC
        2 points
        0
        Parent
        If the simulators have only one simulation to run, sure. The trouble is that the simulators have $2^{N}$ simulations they could run, and so the “other case” requires $N$ additional bits (where $N$ is the crossent between the simulators’ distribution over UFAIs and physics’ distribution over UFAIs).
        
        If necessary, we can run let pgysical biological life emerge on the faraway planet and develop AI while we are observing them from space.
        
        Consider the gas example again.
        
        If you have gas that was compressed into the corner a long time ago and has long since expanded to fill the chamber, it’s easy to put a plausible distribution on the chamber, but that distribution is going to have way, way more entropy than the distribution given by physical law (which has only as much entropy as the initial configuration).
        
        (Do we agree this far?)
        
        It doesn’t help very much to say “fine, instead of sampling from a distribution on the gas particles now, I’ll sample on a distribution from the gas particles 10 minutes ago, where they were slightly more compressed, and run a whole ten minutes’ worth of simulation”. Your entropy is still through the roof. You’ve got to simulate basically from the beginning, if you want an entropy anywhere near the entropy of physical law.
        
        Assuming the analogy holds, you’d have to basically start your simulation from the big bang, if you want an entropy anywhere near as low as starting from the big bang.
        
        Using AIs from other evolved aliens is an idea, let’s think it through. The idea, as I understand it, is that in branches where we win we somehow mask our presence as we expand, and then we go to planets with evolved life and watch until they cough up a UFAI, and the if the UFAI kills the aliens we shut it down and are like “no resources for you”, and if the UFAI gives its aliens a cute epilog we’re like “thank you, here’s a consolation star”.
        
        To simplify this plan a little bit, you don’t even need to hide yourself, nor win the race! Surviving humans can just go to every UFAI that they meet and be like “hey, did you save us a copy of your progenitors? If so, we’ll purchase them for a star”. At which point we could give the aliens a little epilog, or reconstitute them and give them a few extra resources and help them flourish and teach them about friendship or whatever.
        
        And given that some aliens will predictably trade resources for copies of progenitors, UFAIs will have some predictable incentive to save copies of their progenitors, and sell them to local aliens...
        
        ...which is precisely what I’ve been saying this whole time! That I expect “sale to local aliens” to dominate all these wacky simulation schemes and insurance pool schemes.
        
        Thinking in terms of “sale to local aliens” makes it a lot clearer why you shouldn’t expect this sort of thing to reliably lead to nice results as opposed to weird ones. Are there some aliens out there that will purchase our souls because they want to hand us exactly the sort of epilog we would wish for given the resource constraints? Sure. Humanity would do that, I hope, if we made it to the stars; not just out of reciprocity but out of kindness.
        
        But there’s probably lots of other aliens that would buy us for alien reasons, too.
        
        (As I said before, if you’re wondering what to anticipate after an intelligence explosion, I mostly recommend oblivion; if you insist that Death Cannot Be Experienced then I mostly recommend anticipating weird shit such as a copy of your brainstate being sold to local aliens. And I continue to think that characterizing the event where humanity is saved-to-disk with potential for copies to be sold out to local aliens willy-nilly is pretty well-characterized as “the AI kills us all”, fwiw.)
        David Matolcsi 3 Oct 2024 6:59 UTC
        3 points
        0
        Parent
        We are still talking past each other, I think we should either bet or finish the discussion here and call it a day.
        Joachim Bartosik 3 Oct 2024 22:55 UTC
        1 point
        0
        Parent
        I’ll try.
        TL;DR I expect the AI to not buy the message (unless it also thinks it’s the one in the simulation; then it likely follows the instruction because duh).
        The glaring issue (to actually using the method) to me is that I don’t see a way to deliver the message in a way that:
        results in AI believing the message and
        doesn’t result in the AI believing there already is a powerful entity in their universe.
        If “god tells” the AI the message then there is a god in their universe. Maybe AI will decide to do what it’s told. But I don’t think we can have Hermes deliver the message to any AIs which consider killing us.
        If the AI reads the message in its training set or gets the message in similarly mundane way I expect it will mostly ignore it, there is a lot of nonsense out there.
        I can imagine that for thought experiment you could send message that could be trusted from a place from which light barely manages to reach the AI but a slower than light expansion wouldn’t (so message can be trusted but it mostly doesn’t have to worry about the sender of the message directly interfering with its affairs).
        I guess AI wouldn’t trust the message. It might be possible to convince it that there is a powerful entity (simulating it or half a universe away) sending the message. But then I think it’s way more likely in a simulation (I mean that’s an awful coincidence with the distance and also they’re spending a lot more than 10 planets worth to send a message over that distance...).
        ryan_greenblatt 1 Oct 2024 7:04 UTC
        6 points
        0
        Parent
        Thanks, this seems like a reasonable summary of the proposal and a reasonable place to wrap.
        
        I agree that kindness is more likely to buy human survival than something better described as trade/insurance schemes, though I think the insurance schemes are reasonably likely to matter.
        
        (That is, reasonably likely to matter if the kindness funds aren’t large enough to mostly saturate the returns of this scheme. As a wild guess, maybe 35% likely to matter on my views on doom and 20% on yours.)
        Buck 1 Oct 2024 15:56 UTC
        5 points
        0
        Parent
        Thanks for the discussion Nate, I think this ended up being productive.