Every single generation to come is at stake, so I don’t think my own life bears much on whether to defend all 1050+ of theirs.
Note that the idea that >10^50 lives are at stake is typically premised on the notion that there will be a value lock-in event, after which we will successfully colonize the reachable universe. If there is no value lock-in event, then even if we solve AI alignment, values will drift in the long-term, and the stars will eventually be colonized by something that does not share our values. From this perspective, success in AI alignment would merely delay the arrival of a regime of alien values, rather than prevent it entirely. If true, this would imply that positive interventions now are not as astronomically valuable as you might have otherwise thought.
My guess is that the idea of a value lock-in sounded more plausible back in the days when people were more confident there will be (1) a single unified AI that takes control over the world with effectively no real competition forever, and (2) this AI will have an explicit, global utility function over the whole universe that remains unchanging permanently. However, both of these assumptions seem dubious to me currently.
The relevance of value drift here sounds to me analogous to the following exchange.
Alice, who is 10 years old: “Ah, I am about to be hit by a car, I should make sure this doesn’t happen else I’ll never get to live the rest of my life!”
Bob: “But have you considered that you probably won’t really be the same person in 20 years? You’ll change what you value and where you live and what your career is and all sorts. So I wouldn’t say your life is at stake right now.”
The concern about AI misalignment is itself a concern about value drift. People are worried that near-term AIs will not share our values. The point I’m making is that even if we solve this problem for the first generation of smarter-than-human AIs, that doesn’t guarantee that AIs will permanently share our values in every subsequent generation. In your analogy, a large change in the status quo (death) is compared to an arguably smaller and more acceptable change over the long term (biological development). By contrast, I’m comparing a very bad thing to another similarly very bad thing. This analogy seems mostly valid only to the extent you reject the premise that extreme value drift is plausible in the long-term, and I’m not sure why you would reject that premise.
Putting “cultural change” and “an alien species comes along and murders us into extinction” into the same bucket seems like a mistake to me. I understand that in each literally one set of values replaces another. But in the latter case, the route by which those values change is something like “an alien was grown in order to maximize a bunch of functions that we were able to define like stock price and next-token prediction and eventually overthrew us”, and I think that is qualitatively different than “people got richer and wealthier and so what they wanted changed” in a way that is likely to be ~worthless from our perspective.
From reading elsewhere my current model (which you may falsify) is that you think that those values will be sufficiently close that they will still be very valuable from our perspective, or about as valuable as people from 2000 years ago would think us today. I don’t buy the first claim; I think the second claim is more interesting but I’m not really confident that it’s relevant or true.
(Consider this an open offer to dialogue about this point sometime, perhaps at my dialogues party this weekend.)
Putting “cultural change” and “an alien species comes along and murders us into extinction” into the same bucket seems like a mistake to me
Value drift encompasses a lot more than cultural change. If you think humans messing up on alignment could mean something as dramatic as “an alien species comes along and murders us”, surely you should think that the future could continue to include even more, similarly dramatic shifts. Why would we assume that once we solve value alignment for the first generation of AIs, values would then be locked in perfectly forever for all subsequent generations?
Alice, who is 10 years old: “Ah, I am about to be hit by a car, I should make sure this doesn’t happen else I’ll never get to live the rest of my life!”
Bob: “But have you considered that in the future you might get hit by another car? Or get cancer? Or choke on an olive? Current human life expectancy is around 80 years which means later on you’ll very likely die from something else. So I wouldn’t say your life is at stake right now.”
I don’t understand how this new analogy is supposed to apply to the argument, but if I wanted to modify the analogy to get my point across, I’d make Alice 90 years old. Then, I’d point out that, at such an age, getting hit by a car and dying painlessly genuinely isn’t extremely bad, since the alternative is to face death within the next several years with high probability anyway.
If you actually believe that at some point, even with aligned AI, the forces of value drift are so powerful that we are still unlikely to survive, then you’ve just shifted the problem to avoiding this underspecified secondary catastrophe in addition to the first. The astronomical loss is still there.
If you actually believe that at some point, even with aligned AI, the forces of value drift are so powerful that we are still unlikely to survive
Human survival is different from the universe being colonized under the direction of human values. Humans could survive, for example, as a tiny pocket of a cosmic civilization.
The astronomical loss is still there.
Indeed, but the question is, “What was the counterfactual?” If solving AI alignment merely delays an astronomical loss, then it is not astronomically important to solve the problem. (It could still be very important in this case, but just not so important that we should think of it as saving 10^50 lives.)
We would need to solve the general problem of avoiding value drift. Value drift is the phenomenon of changing cultural, social, personal, biological and political values over time. We have observed it in human history during every generation. Older generations vary on average in what they want and care about compared to younger generations. More broadly, species have evolved over time, with constant change on Earth as a result of competition, variation, and natural selection. While over short periods of time, value drift tends to look small, over long periods of time, it can seem enormous.
I don’t know what a good solution to this problem would look like, and some proposed solutions—such as a permanent, very strict global regime of coordination to prevent cultural, biological, and artificial evolution—might be worse than the disease they aim to cure. However, without a solution, our distant descendants will likely be very different from us in ways that we consider morally relevant.
How do you expect this value drift to occur in an environment where humans don’t actually have competitive mental/physical capital? Presumably if humans “drift” beyond some desired bounds, the coalition of people or AIs with the actual compute and martial superiority who are allowing the humans to have these social and political fights will intervene. It wouldn’t matter if there was one AI or twenty, because the N different optimizers are still around and optimizing for the same thing, and the equilibrium between them wouldn’t be meaningfully shifted by culture wars between the newly formed humans.
Value drift happens in almost any environment in which there is variation and selection among entities in the world over time. I’m mostly just saying that things will likely continue to change continuously over the long-term, and a big part of that is that the behavioral tendencies, desires and physical makeup of the relevant entities in the world will continue to evolve too, absent some strong countervailing force to prevent that. This feature of our world does not require that humans continue to have competitive mental and physical capital. On the broadest level, the change I’m referring to took place before humans ever existed.
Some people enjoy arguing philosophical points, and there is nothing wrong with that.
Do you believe that the considerations you have just described have any practical relevance to someone who believes that the probability of AI research’s ending all human life some time in the next 60 years is .95 and wants to make a career out of pessimizing that probability?
Yes, I believe this point has practical relevance. If what I’m saying is true, then I do not believe that solving AI alignment has astronomical value (in the sense of saving 10^50 lives). If solving AI alignment does not have astronomical counterfactual value, then its value becomes more comparable to the value of other positive outcomes, like curing aging for people who currently exist. This poses a challenge for those who claim that delaying AI is obviously for the greater good as long as it increases the chance of successful alignment, since that could also cause billions of currently existing people to die.
Note that the idea that >10^50 lives are at stake is typically premised on the notion that there will be a value lock-in event, after which we will successfully colonize the reachable universe. If there is no value lock-in event, then even if we solve AI alignment, values will drift in the long-term, and the stars will eventually be colonized by something that does not share our values. From this perspective, success in AI alignment would merely delay the arrival of a regime of alien values, rather than prevent it entirely. If true, this would imply that positive interventions now are not as astronomically valuable as you might have otherwise thought.
My guess is that the idea of a value lock-in sounded more plausible back in the days when people were more confident there will be (1) a single unified AI that takes control over the world with effectively no real competition forever, and (2) this AI will have an explicit, global utility function over the whole universe that remains unchanging permanently. However, both of these assumptions seem dubious to me currently.
The relevance of value drift here sounds to me analogous to the following exchange.
The concern about AI misalignment is itself a concern about value drift. People are worried that near-term AIs will not share our values. The point I’m making is that even if we solve this problem for the first generation of smarter-than-human AIs, that doesn’t guarantee that AIs will permanently share our values in every subsequent generation. In your analogy, a large change in the status quo (death) is compared to an arguably smaller and more acceptable change over the long term (biological development). By contrast, I’m comparing a very bad thing to another similarly very bad thing. This analogy seems mostly valid only to the extent you reject the premise that extreme value drift is plausible in the long-term, and I’m not sure why you would reject that premise.
Putting “cultural change” and “an alien species comes along and murders us into extinction” into the same bucket seems like a mistake to me. I understand that in each literally one set of values replaces another. But in the latter case, the route by which those values change is something like “an alien was grown in order to maximize a bunch of functions that we were able to define like stock price and next-token prediction and eventually overthrew us”, and I think that is qualitatively different than “people got richer and wealthier and so what they wanted changed” in a way that is likely to be ~worthless from our perspective.
From reading elsewhere my current model (which you may falsify) is that you think that those values will be sufficiently close that they will still be very valuable from our perspective, or about as valuable as people from 2000 years ago would think us today. I don’t buy the first claim; I think the second claim is more interesting but I’m not really confident that it’s relevant or true.
(Consider this an open offer to dialogue about this point sometime, perhaps at my dialogues party this weekend.)
Value drift encompasses a lot more than cultural change. If you think humans messing up on alignment could mean something as dramatic as “an alien species comes along and murders us”, surely you should think that the future could continue to include even more, similarly dramatic shifts. Why would we assume that once we solve value alignment for the first generation of AIs, values would then be locked in perfectly forever for all subsequent generations?
I don’t understand how this new analogy is supposed to apply to the argument, but if I wanted to modify the analogy to get my point across, I’d make Alice 90 years old. Then, I’d point out that, at such an age, getting hit by a car and dying painlessly genuinely isn’t extremely bad, since the alternative is to face death within the next several years with high probability anyway.
If you actually believe that at some point, even with aligned AI, the forces of value drift are so powerful that we are still unlikely to survive, then you’ve just shifted the problem to avoiding this underspecified secondary catastrophe in addition to the first. The astronomical loss is still there.
Human survival is different from the universe being colonized under the direction of human values. Humans could survive, for example, as a tiny pocket of a cosmic civilization.
Indeed, but the question is, “What was the counterfactual?” If solving AI alignment merely delays an astronomical loss, then it is not astronomically important to solve the problem. (It could still be very important in this case, but just not so important that we should think of it as saving 10^50 lives.)
Can you specify the problem(s) you think we might need to solve, in addition to alignment, in order to avoid this sterility outcome?
We would need to solve the general problem of avoiding value drift. Value drift is the phenomenon of changing cultural, social, personal, biological and political values over time. We have observed it in human history during every generation. Older generations vary on average in what they want and care about compared to younger generations. More broadly, species have evolved over time, with constant change on Earth as a result of competition, variation, and natural selection. While over short periods of time, value drift tends to look small, over long periods of time, it can seem enormous.
I don’t know what a good solution to this problem would look like, and some proposed solutions—such as a permanent, very strict global regime of coordination to prevent cultural, biological, and artificial evolution—might be worse than the disease they aim to cure. However, without a solution, our distant descendants will likely be very different from us in ways that we consider morally relevant.
How do you expect this value drift to occur in an environment where humans don’t actually have competitive mental/physical capital? Presumably if humans “drift” beyond some desired bounds, the coalition of people or AIs with the actual compute and martial superiority who are allowing the humans to have these social and political fights will intervene. It wouldn’t matter if there was one AI or twenty, because the N different optimizers are still around and optimizing for the same thing, and the equilibrium between them wouldn’t be meaningfully shifted by culture wars between the newly formed humans.
Value drift happens in almost any environment in which there is variation and selection among entities in the world over time. I’m mostly just saying that things will likely continue to change continuously over the long-term, and a big part of that is that the behavioral tendencies, desires and physical makeup of the relevant entities in the world will continue to evolve too, absent some strong countervailing force to prevent that. This feature of our world does not require that humans continue to have competitive mental and physical capital. On the broadest level, the change I’m referring to took place before humans ever existed.
I do not understand how you expect this general tendency to continue against the will of one or more nearby gpu clusters.
Some people enjoy arguing philosophical points, and there is nothing wrong with that.
Do you believe that the considerations you have just described have any practical relevance to someone who believes that the probability of AI research’s ending all human life some time in the next 60 years is .95 and wants to make a career out of pessimizing that probability?
Yes, I believe this point has practical relevance. If what I’m saying is true, then I do not believe that solving AI alignment has astronomical value (in the sense of saving 10^50 lives). If solving AI alignment does not have astronomical counterfactual value, then its value becomes more comparable to the value of other positive outcomes, like curing aging for people who currently exist. This poses a challenge for those who claim that delaying AI is obviously for the greater good as long as it increases the chance of successful alignment, since that could also cause billions of currently existing people to die.