interstice comments on Decision theory does not imply that we get to have nice things

interstice 18 Oct 2022 4:00 UTC
LW: 20 AF: 11
10
AF

There aren’t enough simulators above us that care enough about us-in-particular to pay in paperclips. There are so many things to care about! Why us, rather than giant gold obelisks?

What about neighboring Everett branches where humanity succeeds at alignment? If you think alignment isn’t completely impossible, it seems such branches should have at least roughly comparable weight to branches where we fail, so trade could be possible.
What links here?
- You can, in fact, bamboozle an unaligned AI into sparing your life by David Matolcsi (29 Sep 2024 16:59 UTC; 100 points)
- jaan 22 Oct 2022 12:39 UTC
  26 points
  6
  Parent
  yeah, as far as i can currently tell (and influence), we’re totally going to use a sizeable fraction of FAI-worlds to help out the less fortunate ones. or perhaps implement a more general strategy, like mutual insurance pact of evolved minds (MIPEM).
  this, indeed, assumes that human CEV has diminishing returns to resources, but (unlike nate in the sibling comment!) i’d be shocked if that wasn’t true.
  - So8res 22 Oct 2022 16:31 UTC
    11 points
    5
    Parent
    one thing that makes this tricky is that, even if you think there’s a 20% chance we make it, that’s not the same as thinking that 20% of Everett branches starting in this position make it. my guess is that whether we win or lose from the current board position is grossly overdetermined, and what we’re fighting for (and uncertain about) is which way it’s overdetermined. (like how we probably have more than one in a billion odds that the light speed limit can be broken, but that doesn’t mean that we think that one in every billion photons breaks the limit.) the surviving humans probably don’t have much resource to spend, and can’t purchase all that many nice things for the losers.
    
    (Everett branches fall off in amplitude really fast. Exponentially fast. Back-of-the-envelope: if we’re 75 even-odds quantum coincidences away from victory, and if paperclipper utility is linear in matter, then the survivors would struggle to purchase even a single star for the losers, even if they paid all their matter.)
    
    ftr, i’m pretty uncertain about whether CEV has diminishing returns to resources on merely cosmic scales. i have some sympathy for arguments like vanessa’s, and it seems pretty likely that returns diminish eventually. but also we know that two people together can have more than twice as much fun as two people alone, and it seems to me that that plausibly also holds for galaxies as well.
    
    as a stupid toy model, suppose that every time that population increases by a factor of ten, civilization’s art output improves by one qualitative step. and suppose that no matter how large civilization gets, it factors into sub-communities of 150 people, who don’t interact except by trading artwork. then having 10 separate universes each with one dunbar cluster is worse than having 1 universe with 10 dunbar clusters, b/c the latter is much like the former except that everybody gets to consume qualitatively better art.
    
    separately, it’s unclear to me whether humanity, in the fragment of worlds where they win, would prefer to spend a ton of their own galaxies on paperclips (so that the paperclips will spend a couple of their stars here on humans), versus spending a ton of their own galaxies on building (say) alien friends, who will in return build some human friends. on the one hand, the paperclipper that kills us has an easier time giving us stars (b/c it has our brain scans). but on the other hand, we enjoy the company of aliens, in a way that we don’t enjoy galaxies filled with paperclips. there’s an opportunity cost to all those galaxies, especially if the exchange rates are extremely bad on account of how few branches humanity survives in (if we turn out to mostly-lose).
    - jaan 22 Oct 2022 17:25 UTC
      7 points
      3
      Parent
      roger. i think (and my model of you agrees) that this discussion bottoms out in speculating what CEV (or equivalent) would prescribe.
      
      my own intuition (as somewhat supported by the moral progress/moral circle expansion in our culture) is that it will have a nonzero component of “try to help out the fellow humans/biologicals/evolved minds/conscious minds/agents with diminishing utility function if not too expensive, and especially if they would do the same in your position”.
      - So8res 22 Oct 2022 21:16 UTC
        10 points
        7
        Parent
        tbc, i also suspect & hope that our moral circle will expand to include all fellow sentients. (but it doesn’t follow from that that paying paperclippers to unkill their creators is a good use of limited resources. for instance, those are resources that could perhaps be more efficiently spent purchasing and instantiating the stored mindstates of killed aliens that the surviving-branch humans meet at the edge of their own expansion.)
        
        but also, yeah, i agree it’s all guesswork. we have friends out there in the multiverse who will be willing to give us some nice things, and it’s hard to guess how much. that said, i stand by the point that that’s not us trading with the AI; that’s us destroying all of the value in our universe-shard and getting ourselves killed in the process, and then banking on the competence and compassion of aliens.
        
        (in other words: i’m not saying that we won’t get any nice things. i’m saying that the human-reachable fragment of the universe will be ~totally destroyed if we screw up, with ~none of it going to nice things, not even if the UFAI uses LDT.)
        jaan 23 Oct 2022 6:32 UTC
        1 point
        0
        Parent
        yeah, this seems to be the crux: what will CEV prescribe for spending the altruistic (reciprocal cooperation) budget on. my intuition continues to insist that purchasing the original star systems from UFAIs is pretty high on the shopping list, but i can see arguments (including a few you gave above) against that.
        
        oh, btw, one sad failure mode would be getting clipped by a proto-UFAI that’s too stupid to realise it’s in a multi-agent environment or something,
        
        ETA: and, tbc, just like interstice points out below, my “us/me” label casts a wider net than “us in this particular everett branch where things look particularly bleak”.
        Noosphere89 22 Oct 2022 21:53 UTC
        1 point
        0
        Parent
        I don’t agree, and will write up a post detailing why I disagree.
    - interstice 22 Oct 2022 17:39 UTC
      4 points
      1
      Parent
      
      even if you think there’s a 20% chance we make it, that’s not the same as thinking that 20% of Everett branches starting in this position make it
      
      Although worlds starting in this position are a tiny minority anyway, right? Most of the Everett branches containing “humanity” have histories very different from our own. And if alignment is neither easy nor impossible—if it requires insights fitting “in a textbook from the future”, per Eliezer—I think we can say with reasonable (logical) confidence that a non-trivial fraction of worlds will see a successful humanity, because all that is required for success in such a scenario is having a competent alignment-aware world government. Looking at the history of Earth governments, I think we can say that while such a scenario may be unlikely, it is not so unlikely as to render us overwhelmingly likely to fail.
      
      I think a more likely reason for preponderance of “failure” is that alignment in full generality may be intractable. But such a scenario would have its upsides, as well as making a hard binary of “failure/success” less meaningful.
- So8res 18 Oct 2022 4:21 UTC
  LW: 16 AF: 10
  9
  AF Parent
  my guess is it’s not worth it on account of transaction-costs. what’re they gonna do, trade half a universe of paperclips for half a universe of Fun? they can already get half a universe of Fun, by spending on Fun what they would have traded away to paperclips!
  
  and, i’d guess that one big universe is more than twice as Fun as two small universes, so even if there were no transaction costs it wouldn’t be worth it. (humans can have more fun when there’s two people in the same room, than one person each in two separate rooms.)
  
  there’s also an issue where it’s not like every UFAI likes paperclips in particular. it’s not like 1% of humanity’s branches survive and 99% make paperclips, it’s like 1% survive and 1% make paperclips and 1% make giant gold obelisks, etc. etc. the surviving humans have a hard time figuring out exactly what killed their bretheren, and they have more UFAIs to trade with than just the paperclipper (if they want to trade at all).
  
  maybe the branches that survive decide to spend some stars on a mixture of plausible-human-UFAI-goals in exchange for humans getting an asteroid in lots of places, if the transaction costs are low and the returns-to-scale diminish enough and the visibility works out favorably. but it looks pretty dicey to me, and the point about discussing aliens first still stands.
  - Vanessa Kosoy 18 Oct 2022 6:22 UTC
    LW: 39 AF: 17
    22
    AF Parent
    
    and, i’d guess that one big universe is more than twice as Fun as two small universes, so even if there were no transaction costs it wouldn’t be worth it. (humans can have more fun when there’s two people in the same room, than one person each in two separate rooms.)
    
    This sounds astronomically wrong to me. I think that my personal utility function gets close to saturation with a tiny fraction of the resources in universe-shard. Two people is one room is better than two people in separate rooms, yes. But, two rooms with trillion people each is virtually the same as one room with two trillion. The returns on interactions with additional people fall off exponentially past the Dunbar number.
    
    In other words, I would gladly take a 100% probability of utopia with (say) 100 million people that include me and my loved ones over 99% human extinction and 1% anything at all. (In terms of raw utility calculus, i.e. ignoring trades with other factual or counterfactual minds.)
    What links here?
    Vanessa Kosoy's comment on A short critique of Vanessa Kosoy’s PreDCA by Martín Soto (14 Nov 2022 7:28 UTC; 2 points)
    - Rob Bensinger 20 Oct 2022 3:20 UTC
      LW: 14 AF: 8
      14
      AF Parent
      But, two rooms with trillion people each is virtually the same as one room with two trillion. The returns on interactions with additional people fall off exponentially past the Dunbar number.
      You’re conflating “would I enjoy interacting with X?” with “is it good for X to exist?”. Which is almost understandable given that Nate used the “two people can have more fun in the same room” example to illustrate why utility isn’t linear in population. But this comment has an IMO bizarre amount of agreekarma (26 net agreement, with 11 votes), which makes me wonder if people are missing that this comment is leaning on a premise like “stuff only matters if it adds to my own life and experiences”?
      Replacing the probabilistic hypothetical with a deterministic one: the reason I wouldn’t advocate killing a Graham’s number of humans in order to save 100 million people (myself and my loved ones included) is that my utility function isn’t saturated when my life gets saturated. Analogously, I still care about humans living on the other side of Earth even though I’ve never met them, and never expect to meet them. I value good experiences happening, even if they don’t affect me in any way (and even if I’ve never met the person who they’re happening to).
      - Vanessa Kosoy 20 Oct 2022 5:21 UTC
        LW: 15 AF: 9
        5
        AF Parent
        First, you can consider preferences that are impartial but sublinear in the number of people. So, you can disagree with Nate’s room analogy without the premise “stuff only matters if it adds to my own life and experiences”.
        
        Second, my preferences are indeed partial. But even that doesn’t mean “stuff only matters if it adds to my own life and experiences”. I do think that stuff only matters (to me) if it’s in some sense causally connected to my life and experiences. More details here.
        
        Third, I don’t know what do you mean by “good”. The questions that I understand are:
        
        Do I want X as an end in itself?
        Would I choose X in order for someone to (causally or acausally) reciprocate by choosing Y which I want as an end in itself?
        Do I support a system of social norms that incentives X?
        
        My example with the 100 million referred to question 1. Obviously, in certain scenarios my actual choice would be the opposite on game-theoretic cooperation grounds (I would make a disproportionate sacrifice to save “far away” people in order for them to save me and/or my loved ones in the counterfactual in which they are making the choice).
        
        Also, reminder that unbounded utility functions are incoherent because their expected values under Solomonoff-like priors diverge (a.k.a. Pascal mugging).
        Rob Bensinger 20 Oct 2022 17:44 UTC
        LW: 3 AF: 2
        1
        AF Parent
        My example with the 100 million referred to question 1.
        Yeah, I’m also talking about question 1.
        I do think that stuff only matters (to me) if it’s in some sense causally connected to my life and experiences.
        Seems obviously false as a description of my values (and, I’d guess, just about every human’s).
        Consider the simple example of a universe that consists of two planets: mine, and another person’s. We don’t have spaceships, so we can’t interact. I am not therefore indifferent to whether the other person is being horribly tortured for thousands of years.
        If I spontaneously consider the hypothetical, I will very strongly prefer that my neighbor not be tortured. If we add the claims that I can’t affect it and can’t ever know about it, I don’t suddenly go “Oh, never mind, fuck that guy”. Stuff that happens to other people is real, even if I don’t interact with it.
        Vanessa Kosoy 20 Oct 2022 19:21 UTC
        LW: 17 AF: 6
        13
        AF Parent
        I’m curious what is the evidence you see that this is false as a description of the values of just about every human, given that
        
        I, a human [citation needed] tell you that this seems to be a description of my values.
        Almost every culture that ever existed had norms that prioritized helping family, friends and neighbors over helping random strangers, not to mention strangers that you never met.
        Most people don’t do much to help random strangers they never met, with the notable exception of effective altruists, but even most effective altruists only go that far^[1].
        Evolutionary psychology can fairly easily explain helping your family and tribe, but it seems hard to explain impartial altruism towards all humans.
        
        ↩︎
        The common wisdom in EA is, you shouldn’t donate 90% of your salary or deny yourself every luxury because if you live a fun life you will be more effective at helping others. However, this strikes me as suspiciously convenient and self-serving.
        
        Vanessa Kosoy 21 Oct 2022 6:33 UTC
        LW: 4 AF: 2
        1
        AF Parent
        P.S.
        
        I think that in your example, if a person is given a button that can save a person on a different planet from being tortured, they will have a direct incentive to press the button, because the button is a causal connection in itself, and consciously reasoning about the person on the other planet is a causal^[1] connection in the other direction. That said, a person still has a limited budget of such causal connections (you cannot reason about a group of arbitrarily many people, with fixed non-zero amount of paying attention to the individual details of every person, in a fixed time-frame). Therefore, while the incentive is positive, its magnitude saturates as the number of saved people grows s.t. e.g. a button that saves a million people is virtually the same as a button that saves a billion people.
        
        ↩︎
        I’m modeling this via Turing RL, where conscious reasoning can be regarded as a form of observation. Ofc this means we are talking about “logical” rather than “physical” causality.
  - interstice 18 Oct 2022 4:38 UTC
    6 points
    0
    Parent
    
    and, i’d guess that one big universe is more than twice as Fun as two small universes, so even if there were no transaction costs it wouldn’t be worth it
    
    Perhaps, although I also think it’s plausible that future humanity would find universes in which we’re wiped out completely to be particularly sad and so worth spending a disproportionate amount of Fun to partially recover.
    
    it’s like 1% survive and 1% make paperclips and 1% make giant gold obelisks, etc.
    
    I don’t think this changes the situation since future humanity can just make paperclips with probability ¹⁄₉₉, obelisks with probability ¹⁄₉₉, etc. putting us in an identical bargaining situation with each possible UFAI as if there was only one.
    
    maybe the branches that survive decide to spend some stars on a mixture of plausible-human-UFAI-goals in exchange for humans getting an asteroid in lots of places, if the transaction costs are low and the returns-to-scale diminish enough and the visibility works out favorably. but it looks pretty dicey to me, and the point about discussing aliens first still stands.
    
    Yeah, this is the scenario I think is most likely. As you say it’s a pretty uncomfortable thing to lay our hopes on, but I thought it was more plausible than any of the scenarios brought up in the post so deserved a mention. It doesn’t feel intuitively obvious to me that aliens are a better bet—I guess it comes down to how much trust you have in generic aliens being nice VS. how likely AIs are to be motivated by weird anthropic considerations(in a way that we can actually predict).
    - anonce 19 Oct 2022 1:35 UTC
      1 point
      0
      Parent
      Paperclips vs obelisks does make the bargaining harder because clippy would be offered fewer expected paperclips.
      My current guess is we survive if our CEV puts a steep premium on that. Of course, such hopes of trade ex machina shouldn’t affect how we orient to the alignment problem, even if they affect our personal lives. We should still play to win.
      - interstice 19 Oct 2022 2:47 UTC
        7 points
        2
        Parent
        
        Paperclips vs obelisks does make the bargaining harder because clippy would be offered fewer expected paperclips.
        
        But Clippy also controls fewer expected universes, so the relative bargaining positions of humans VS UFAIs remain the same(compared to a scenario in which all UFAIs had the same value system)
        anonce 19 Oct 2022 20:32 UTC
        4 points
        0
        Parent
        Ah right, because Clippy has less measure, and so has less to offer, so less needs to be offered to it. Nice catch! Guess I’ve been sort of heeding Nate’s advice not to think much about this. :)
        Of course, there would still be significant overhead from trading with and/or outbidding sampled plethoras of UFAIs, vs the toy scenario where it’s just Clippy.
        I currently suspect we still get more survival measure from aliens in this branch who solved their alignment problems and have a policy of offering deals to UFAIs that didn’t kill their biological boot loaders. Such aliens need not be motivated by compassion to the extent that aboriginals form a Schelling bloc, handwave appendagewave. (But we should still play to win, like they did.)