Putting “cultural change” and “an alien species comes along and murders us into extinction” into the same bucket seems like a mistake to me. I understand that in each literally one set of values replaces another. But in the latter case, the route by which those values change is something like “an alien was grown in order to maximize a bunch of functions that we were able to define like stock price and next-token prediction and eventually overthrew us”, and I think that is qualitatively different than “people got richer and wealthier and so what they wanted changed” in a way that is likely to be ~worthless from our perspective.
From reading elsewhere my current model (which you may falsify) is that you think that those values will be sufficiently close that they will still be very valuable from our perspective, or about as valuable as people from 2000 years ago would think us today. I don’t buy the first claim; I think the second claim is more interesting but I’m not really confident that it’s relevant or true.
(Consider this an open offer to dialogue about this point sometime, perhaps at my dialogues party this weekend.)
Putting “cultural change” and “an alien species comes along and murders us into extinction” into the same bucket seems like a mistake to me
Value drift encompasses a lot more than cultural change. If you think humans messing up on alignment could mean something as dramatic as “an alien species comes along and murders us”, surely you should think that the future could continue to include even more, similarly dramatic shifts. Why would we assume that once we solve value alignment for the first generation of AIs, values would then be locked in perfectly forever for all subsequent generations?
Alice, who is 10 years old: “Ah, I am about to be hit by a car, I should make sure this doesn’t happen else I’ll never get to live the rest of my life!”
Bob: “But have you considered that in the future you might get hit by another car? Or get cancer? Or choke on an olive? Current human life expectancy is around 80 years which means later on you’ll very likely die from something else. So I wouldn’t say your life is at stake right now.”
I don’t understand how this new analogy is supposed to apply to the argument, but if I wanted to modify the analogy to get my point across, I’d make Alice 90 years old. Then, I’d point out that, at such an age, getting hit by a car and dying painlessly genuinely isn’t extremely bad, since the alternative is to face death within the next several years with high probability anyway.
Putting “cultural change” and “an alien species comes along and murders us into extinction” into the same bucket seems like a mistake to me. I understand that in each literally one set of values replaces another. But in the latter case, the route by which those values change is something like “an alien was grown in order to maximize a bunch of functions that we were able to define like stock price and next-token prediction and eventually overthrew us”, and I think that is qualitatively different than “people got richer and wealthier and so what they wanted changed” in a way that is likely to be ~worthless from our perspective.
From reading elsewhere my current model (which you may falsify) is that you think that those values will be sufficiently close that they will still be very valuable from our perspective, or about as valuable as people from 2000 years ago would think us today. I don’t buy the first claim; I think the second claim is more interesting but I’m not really confident that it’s relevant or true.
(Consider this an open offer to dialogue about this point sometime, perhaps at my dialogues party this weekend.)
Value drift encompasses a lot more than cultural change. If you think humans messing up on alignment could mean something as dramatic as “an alien species comes along and murders us”, surely you should think that the future could continue to include even more, similarly dramatic shifts. Why would we assume that once we solve value alignment for the first generation of AIs, values would then be locked in perfectly forever for all subsequent generations?
I don’t understand how this new analogy is supposed to apply to the argument, but if I wanted to modify the analogy to get my point across, I’d make Alice 90 years old. Then, I’d point out that, at such an age, getting hit by a car and dying painlessly genuinely isn’t extremely bad, since the alternative is to face death within the next several years with high probability anyway.