Matthew Barnett comments on Saying the quiet part out loud: trading off x-risk for personal immortality

Matthew Barnett 3 Nov 2023 1:53 UTC
24 points
23
Every single generation to come is at stake, so I don’t think my own life bears much on whether to defend all $10^{50 +}$ of theirs.
Note that the idea that >10^50 lives are at stake is typically premised on the notion that there will be a value lock-in event, after which we will successfully colonize the reachable universe. If there is no value lock-in event, then even if we solve AI alignment, values will drift in the long-term, and the stars will eventually be colonized by something that does not share our values. From this perspective, success in AI alignment would merely delay the arrival of a regime of alien values, rather than prevent it entirely. If true, this would imply that positive interventions now are not as astronomically valuable as you might have otherwise thought.
My guess is that the idea of a value lock-in sounded more plausible back in the days when people were more confident there will be (1) a single unified AI that takes control over the world with effectively no real competition forever, and (2) this AI will have an explicit, global utility function over the whole universe that remains unchanging permanently. However, both of these assumptions seem dubious to me currently.
- Ben Pace 3 Nov 2023 4:00 UTC
  23 points
  9
  Parent
  The relevance of value drift here sounds to me analogous to the following exchange.
  Alice, who is 10 years old: “Ah, I am about to be hit by a car, I should make sure this doesn’t happen else I’ll never get to live the rest of my life!”
  Bob: “But have you considered that you probably won’t really be the same person in 20 years? You’ll change what you value and where you live and what your career is and all sorts. So I wouldn’t say your life is at stake right now.”
  - Matthew Barnett 3 Nov 2023 4:26 UTC
    10 points
    1
    Parent
    The concern about AI misalignment is itself a concern about value drift. People are worried that near-term AIs will not share our values. The point I’m making is that even if we solve this problem for the first generation of smarter-than-human AIs, that doesn’t guarantee that AIs will permanently share our values in every subsequent generation. In your analogy, a large change in the status quo (death) is compared to an arguably smaller and more acceptable change over the long term (biological development). By contrast, I’m comparing a very bad thing to another similarly very bad thing. This analogy seems mostly valid only to the extent you reject the premise that extreme value drift is plausible in the long-term, and I’m not sure why you would reject that premise.
    - Ben Pace 3 Nov 2023 4:56 UTC
      23 points
      30
      Parent
      Putting “cultural change” and “an alien species comes along and murders us into extinction” into the same bucket seems like a mistake to me. I understand that in each literally one set of values replaces another. But in the latter case, the route by which those values change is something like “an alien was grown in order to maximize a bunch of functions that we were able to define like stock price and next-token prediction and eventually overthrew us”, and I think that is qualitatively different than “people got richer and wealthier and so what they wanted changed” in a way that is likely to be ~worthless from our perspective.
      From reading elsewhere my current model (which you may falsify) is that you think that those values will be sufficiently close that they will still be very valuable from our perspective, or about as valuable as people from 2000 years ago would think us today. I don’t buy the first claim; I think the second claim is more interesting but I’m not really confident that it’s relevant or true.
      (Consider this an open offer to dialogue about this point sometime, perhaps at my dialogues party this weekend.)
      - Matthew Barnett 3 Nov 2023 5:29 UTC
        7 points
        1
        Parent
        
        Putting “cultural change” and “an alien species comes along and murders us into extinction” into the same bucket seems like a mistake to me
        
        Value drift encompasses a lot more than cultural change. If you think humans messing up on alignment could mean something as dramatic as “an alien species comes along and murders us”, surely you should think that the future could continue to include even more, similarly dramatic shifts. Why would we assume that once we solve value alignment for the first generation of AIs, values would then be locked in perfectly forever for all subsequent generations?
        Ben Pace 3 Nov 2023 5:53 UTC
        0 points
        0
        Parent
        Alice, who is 10 years old: “Ah, I am about to be hit by a car, I should make sure this doesn’t happen else I’ll never get to live the rest of my life!”
        Bob: “But have you considered that in the future you might get hit by another car? Or get cancer? Or choke on an olive? Current human life expectancy is around 80 years which means later on you’ll very likely die from something else. So I wouldn’t say your life is at stake right now.”
        Matthew Barnett 3 Nov 2023 6:04 UTC
        8 points
        1
        Parent
        I don’t understand how this new analogy is supposed to apply to the argument, but if I wanted to modify the analogy to get my point across, I’d make Alice 90 years old. Then, I’d point out that, at such an age, getting hit by a car and dying painlessly genuinely isn’t extremely bad, since the alternative is to face death within the next several years with high probability anyway.
- lc 3 Nov 2023 2:36 UTC
  6 points
  3
  Parent
  If you actually believe that at some point, even with aligned AI, the forces of value drift are so powerful that we are still unlikely to survive, then you’ve just shifted the problem to avoiding this underspecified secondary catastrophe in addition to the first. The astronomical loss is still there.
  - Matthew Barnett 3 Nov 2023 2:42 UTC
    6 points
    1
    Parent
    
    If you actually believe that at some point, even with aligned AI, the forces of value drift are so powerful that we are still unlikely to survive
    
    Human survival is different from the universe being colonized under the direction of human values. Humans could survive, for example, as a tiny pocket of a cosmic civilization.
    
    The astronomical loss is still there.
    
    Indeed, but the question is, “What was the counterfactual?” If solving AI alignment merely delays an astronomical loss, then it is not astronomically important to solve the problem. (It could still be very important in this case, but just not so important that we should think of it as saving 10^50 lives.)
    - lc 3 Nov 2023 2:45 UTC
      2 points
      0
      Parent
      Can you specify the problem(s) you think we might need to solve, in addition to alignment, in order to avoid this sterility outcome?
      - Matthew Barnett 3 Nov 2023 2:51 UTC
        5 points
        1
        Parent
        We would need to solve the general problem of avoiding value drift. Value drift is the phenomenon of changing cultural, social, personal, biological and political values over time. We have observed it in human history during every generation. Older generations vary on average in what they want and care about compared to younger generations. More broadly, species have evolved over time, with constant change on Earth as a result of competition, variation, and natural selection. While over short periods of time, value drift tends to look small, over long periods of time, it can seem enormous.
        
        I don’t know what a good solution to this problem would look like, and some proposed solutions—such as a permanent, very strict global regime of coordination to prevent cultural, biological, and artificial evolution—might be worse than the disease they aim to cure. However, without a solution, our distant descendants will likely be very different from us in ways that we consider morally relevant.
        lc 3 Nov 2023 3:10 UTC
        2 points
        0
        Parent
        How do you expect this value drift to occur in an environment where humans don’t actually have competitive mental/physical capital? Presumably if humans “drift” beyond some desired bounds, the coalition of people or AIs with the actual compute and martial superiority who are allowing the humans to have these social and political fights will intervene. It wouldn’t matter if there was one AI or twenty, because the N different optimizers are still around and optimizing for the same thing, and the equilibrium between them wouldn’t be meaningfully shifted by culture wars between the newly formed humans.
        Matthew Barnett 3 Nov 2023 3:24 UTC
        6 points
        1
        Parent
        Value drift happens in almost any environment in which there is variation and selection among entities in the world over time. I’m mostly just saying that things will likely continue to change continuously over the long-term, and a big part of that is that the behavioral tendencies, desires and physical makeup of the relevant entities in the world will continue to evolve too, absent some strong countervailing force to prevent that. This feature of our world does not require that humans continue to have competitive mental and physical capital. On the broadest level, the change I’m referring to took place before humans ever existed.
        lc 3 Nov 2023 3:31 UTC
        2 points
        0
        Parent
        I do not understand how you expect this general tendency to continue against the will of one or more nearby gpu clusters.
- RHollerith 3 Nov 2023 20:23 UTC
  2 points
  0
  Parent
  Some people enjoy arguing philosophical points, and there is nothing wrong with that.
  Do you believe that the considerations you have just described have any practical relevance to someone who believes that the probability of AI research’s ending all human life some time in the next 60 years is .95 and wants to make a career out of pessimizing that probability?
  - Matthew Barnett 4 Nov 2023 6:28 UTC
    12 points
    6
    Parent
    Yes, I believe this point has practical relevance. If what I’m saying is true, then I do not believe that solving AI alignment has astronomical value (in the sense of saving 10^50 lives). If solving AI alignment does not have astronomical counterfactual value, then its value becomes more comparable to the value of other positive outcomes, like curing aging for people who currently exist. This poses a challenge for those who claim that delaying AI is obviously for the greater good as long as it increases the chance of successful alignment, since that could also cause billions of currently existing people to die.