I also don’t really see the situation as about AI at all. It’s a structural advantage for certain kinds of values that tend to win out in memetic competition / tend to be easiest to persuade people to adopt / etc. Let’s call such values themselves “attractive.”
The most attractive values given a new technological/social situation are likely to be similar to those given the immediately preceding situation, so I’d generally expect the most attractive values to generally be endemic anyway or close enough to endemic values that they don’t look like they are coming out of left field.
And of course for any given zero-sum conflict and any given human, one of the participants in that conflict would prefer push the human towards more attractive values, so they would be introduced even if not initially endemic.
I don’t think you can get paperclips this way, because people trying to get humans to maximize paperclips would be at a big disadvantage in memetic competition compared with the most attractive values (or even compared to more normal human values, which are presumably more attractive than random stuff).
Then the usual hope is that we are happy with attractive values, e.g. because deliberation and intentional behavior by humans makes “smarter” forms of current values more attractive relative to random bad stuff. And your concern is basically that under distributional shift, why should we think that?
Or perhaps more clearly: if which values are “most attractive” depends on features of the technological landscape, then it’s hard to see why we should be happy just to “take the hand we’re dealt” and be happy with the values that are most attractive on some default technological trajectory. Instead, we would end up with preferences over the technological trajectory.
This is not really distinctive to persuasion, it applies just as well to any changes in the environment that would change the process of deliberation/discussion. The hypothesis seems to be that “how good humans are at persuasion” is just a particularly important/significant kind of shift.
But it seems like what really matters is some ratio between how good you are at persuasion and how good you are at other skills that shape the future (or else perhaps you should be much more concerned about other increases in human capability, like education, that make us better at arguing). And in this sense it’s less clear whether AI is better or worse than the status quo. I guess the main thing is that it’s a candidate for a sharp distributional change and so that’s the kind of thing that you would want to be unusually cautious about.
I mostly think the most robust thing is that it’s reasonable to be very interested in the trajectory of values, to think about how much you like the process of deliberation and discourse and selection and so on that shapes those values, and to think of changes as potentially irreversible (since future people would have no interest in reversing them).
The usual response to this argument is that perhaps future values are basically unrelated to present values anyway (since they will also converge to whatever values are most attractive given future technological situations). But this seems relatively unpersuasive because eventually you might expect to have many agents who try to deliberately make the future good rather than letting what happens, happen, and that this could eventually drive the rate of drift to 0. This seems fairly likely to happen eventually, but you might think that it will take long enough that existing value changes will still wash out.
Then we end up with a complicated set of moral / decision-theoretic questions about which values we are happy enough with. It’s not really clear to me how you should feel about variation across humans, or across cultures, or for humans in new technological situations, or for a particular kind of deep RL, or what. It seems quite clear that we should care some, and I think given realistic treatments of moral uncertainty you should not care too much more about preventing drift than about preventing extinction given drift (e.g. 10x seems very hard to justify to me). But it generally seems like one of the more pressing questions in moral philosophy, and even if you care equally about those two things (suggesting that you’d value some drifted future population’s values 50% as much as some kind of hypothetical ideal realization) you could still get much more traction by trying to prevent forms of drift that we don’t endorse.
given realistic treatments of moral uncertainty you should not care too much more about preventing drift than about preventing extinction given drift (e.g. 10x seems very hard to justify to me).
I think you already believe this, but just to clarify: this “extinction” is about the extinction of Earth-originating intelligence, not about humans in particular. So AI alignment is an intervention to prevent drift, not an intervention to prevent extinction. (Though of course, we could care differently about persuasion-tool-induced drift vs unaliged-AI-induced drift.)
Thanks for this! Re: it’s not really about AI, it’s about memetics & ideologies: Yep, totally agree. (The OP puts the emphasis on the memetic ecosystem & thinks of persuasion tools as a change in the fitness landscape. Also, I wrote this story a while back.) What follows is a point-by-point response:
The most attractive values given a new technological/social situation are likely to be similar to those given the immediately preceding situation, so I’d generally expect the most attractive values to generally be endemic anyway or close enough to endemic values that they don’t look like they are coming out of left field.
Maybe? I am not sure memetic evolution works this fast though. Think about how biological evolution doesn’t adapt immediately to changes in environment, it takes thousands of years at least, arguably millions depending on what counts as “fully adapted” to the new environment. Replication times for memes are orders of magnitude faster, but that just means it should take a few orders of magnitude less time… and during e.g. a slow takeoff scenario there might just not be that much time. (Disclaimer: I’m ignorant of the math behind this sort of thing). Basically, as tech and economic progress speeds up but memetic evolution stays constant, we should expect there to be some point where the former outstrips the latter and the environment is changing faster than the attractive-memes-for-the-environment can appear and become endemic. Now of course memetic evolution is speeding up too, but the point is that until further argument I’m not 100% convinced that we aren’t already out-of-equilibrium.
And of course for any given zero-sum conflict and any given human, one of the participants in that conflict would prefer push the human towards more attractive values, so they would be introduced even if not initially endemic.
Not sure this argument works. First of all, very few conflicts are actually zero sum. Usually there are some world-states that are worse by both players’ lights than some other world-states. Humans being in the most attractive memetic state may be like this.
I don’t think you can get paperclips this way, because people trying to get humans to maximize paperclips would be at a big disadvantage in memetic competition compared with the most attractive values (or even compared to more normal human values, which are presumably more attractive than random stuff).
Agreed.
Then the usual hope is that we are happy with attractive values, e.g. because deliberation and intentional behavior by humans makes “smarter” forms of current values more attractive relative to random bad stuff. And your concern is basically that under distributional shift, why should we think that?
Agreed. I would add that even without distributional shift it is unclear why we should expect attractive values to be good. (Maybe the idea is that good = current values because moral antirealism, and current values are the attractive ones for the current environment via the argument above? I guess I’d want that argument spelled out more and the premises argued for.)
Or perhaps more clearly: if which values are “most attractive” depends on features of the technological landscape, then it’s hard to see why we should be happy just to “take the hand we’re dealt” and be happy with the values that are most attractive on some default technological trajectory. Instead, we would end up with preferences over the technological trajectory.
Yes.
This is not really distinctive to persuasion, it applies just as well to any changes in the environment that would change the process of deliberation/discussion. The hypothesis seems to be that “how good humans are at persuasion” is just a particularly important/significant kind of shift.
Yes? I think it’s particularly important for reasons discussed in the “speculation” section, and because it seems to be in our immediate future and indeed our present. Basically, persuasion tools make ideologies (:= a particular kind of memeplex) stronger and stickier, and they change the landscape so that the ideologies that control the tech platforms have a significant advantage.
But it seems like what really matters is some ratio between how good you are at persuasion and how good you are at other skills that shape the future (or else perhaps you should be much more concerned about other increases in human capability, like education, that make us better at arguing). And in this sense it’s less clear whether AI is better or worse than the status quo. I guess the main thing is that it’s a candidate for a sharp distributional change and so that’s the kind of thing that you would want to be unusually cautious about.
Has education increased much recently? Not in a way that’s made us significantly more rational as a group, as far as I can tell. Changes in the US education system over the last 20 years presumably made some difference, but they haven’t exactly put us on a bright path towards rational discussion of important issues. My guess is that the effect size is swamped by larger effects from the Internet.
I mostly think the most robust thing is that it’s reasonable to be very interested in the trajectory of values, to think about how much you like the process of deliberation and discourse and selection and so on that shapes those values, and to think of changes as potentially irreversible (since future people would have no interest in reversing them).
The usual response to this argument is that perhaps future values are basically unrelated to present values anyway (since they will also converge to whatever values are most attractive given future technological situations). But this seems relatively unpersuasive because eventually you might expect to have many agents who try to deliberately make the future good rather than letting what happens, happen, and that this could eventually drive the rate of drift to 0. This seems fairly likely to happen eventually, but you might think that it will take long enough that existing value changes will still wash out.
Then we end up with a complicated set of moral / decision-theoretic questions about which values we are happy enough with. It’s not really clear to me how you should feel about variation across humans, or across cultures, or for humans in new technological situations, or for a particular kind of deep RL, or what. It seems quite clear that we should care some, and I think given realistic treatments of moral uncertainty you should not care too much more about preventing drift than about preventing extinction given drift (e.g. 10x seems very hard to justify to me). But it generally seems like one of the more pressing questions in moral philosophy, and even if you care equally about those two things (suggesting that you’d value some drifted future population’s values 50% as much as some kind of hypothetical ideal realization) you could still get much more traction by trying to prevent forms of drift that we don’t endorse.
I agree that way of thinking about it seems useful and worthwhile. Are you also implying that thinking specifically about the effects of persuasion tools is not so useful or worthwhile?
I should say btw that you’ve been talking about values but I meant to talk about beliefs as well as values. Memes, in general. Beliefs can get feedback from reality more easily and thus hopefully the attractive beliefs are more likely to be good than the attractive values. But even so, there is room to wonder whether the attractive beliefs for a given environment will all be true… so far, for example, plenty of false beliefs seem to be pretty attractive...
I also don’t really see the situation as about AI at all. It’s a structural advantage for certain kinds of values that tend to win out in memetic competition / tend to be easiest to persuade people to adopt / etc. Let’s call such values themselves “attractive.”
The most attractive values given a new technological/social situation are likely to be similar to those given the immediately preceding situation, so I’d generally expect the most attractive values to generally be endemic anyway or close enough to endemic values that they don’t look like they are coming out of left field.
And of course for any given zero-sum conflict and any given human, one of the participants in that conflict would prefer push the human towards more attractive values, so they would be introduced even if not initially endemic.
I don’t think you can get paperclips this way, because people trying to get humans to maximize paperclips would be at a big disadvantage in memetic competition compared with the most attractive values (or even compared to more normal human values, which are presumably more attractive than random stuff).
Then the usual hope is that we are happy with attractive values, e.g. because deliberation and intentional behavior by humans makes “smarter” forms of current values more attractive relative to random bad stuff. And your concern is basically that under distributional shift, why should we think that?
Or perhaps more clearly: if which values are “most attractive” depends on features of the technological landscape, then it’s hard to see why we should be happy just to “take the hand we’re dealt” and be happy with the values that are most attractive on some default technological trajectory. Instead, we would end up with preferences over the technological trajectory.
This is not really distinctive to persuasion, it applies just as well to any changes in the environment that would change the process of deliberation/discussion. The hypothesis seems to be that “how good humans are at persuasion” is just a particularly important/significant kind of shift.
But it seems like what really matters is some ratio between how good you are at persuasion and how good you are at other skills that shape the future (or else perhaps you should be much more concerned about other increases in human capability, like education, that make us better at arguing). And in this sense it’s less clear whether AI is better or worse than the status quo. I guess the main thing is that it’s a candidate for a sharp distributional change and so that’s the kind of thing that you would want to be unusually cautious about.
I mostly think the most robust thing is that it’s reasonable to be very interested in the trajectory of values, to think about how much you like the process of deliberation and discourse and selection and so on that shapes those values, and to think of changes as potentially irreversible (since future people would have no interest in reversing them).
The usual response to this argument is that perhaps future values are basically unrelated to present values anyway (since they will also converge to whatever values are most attractive given future technological situations). But this seems relatively unpersuasive because eventually you might expect to have many agents who try to deliberately make the future good rather than letting what happens, happen, and that this could eventually drive the rate of drift to 0. This seems fairly likely to happen eventually, but you might think that it will take long enough that existing value changes will still wash out.
Then we end up with a complicated set of moral / decision-theoretic questions about which values we are happy enough with. It’s not really clear to me how you should feel about variation across humans, or across cultures, or for humans in new technological situations, or for a particular kind of deep RL, or what. It seems quite clear that we should care some, and I think given realistic treatments of moral uncertainty you should not care too much more about preventing drift than about preventing extinction given drift (e.g. 10x seems very hard to justify to me). But it generally seems like one of the more pressing questions in moral philosophy, and even if you care equally about those two things (suggesting that you’d value some drifted future population’s values 50% as much as some kind of hypothetical ideal realization) you could still get much more traction by trying to prevent forms of drift that we don’t endorse.
I think you already believe this, but just to clarify: this “extinction” is about the extinction of Earth-originating intelligence, not about humans in particular. So AI alignment is an intervention to prevent drift, not an intervention to prevent extinction. (Though of course, we could care differently about persuasion-tool-induced drift vs unaliged-AI-induced drift.)
Thanks for this! Re: it’s not really about AI, it’s about memetics & ideologies: Yep, totally agree. (The OP puts the emphasis on the memetic ecosystem & thinks of persuasion tools as a change in the fitness landscape. Also, I wrote this story a while back.) What follows is a point-by-point response:
Maybe? I am not sure memetic evolution works this fast though. Think about how biological evolution doesn’t adapt immediately to changes in environment, it takes thousands of years at least, arguably millions depending on what counts as “fully adapted” to the new environment. Replication times for memes are orders of magnitude faster, but that just means it should take a few orders of magnitude less time… and during e.g. a slow takeoff scenario there might just not be that much time. (Disclaimer: I’m ignorant of the math behind this sort of thing). Basically, as tech and economic progress speeds up but memetic evolution stays constant, we should expect there to be some point where the former outstrips the latter and the environment is changing faster than the attractive-memes-for-the-environment can appear and become endemic. Now of course memetic evolution is speeding up too, but the point is that until further argument I’m not 100% convinced that we aren’t already out-of-equilibrium.
Not sure this argument works. First of all, very few conflicts are actually zero sum. Usually there are some world-states that are worse by both players’ lights than some other world-states. Humans being in the most attractive memetic state may be like this.
Agreed.
Agreed. I would add that even without distributional shift it is unclear why we should expect attractive values to be good. (Maybe the idea is that good = current values because moral antirealism, and current values are the attractive ones for the current environment via the argument above? I guess I’d want that argument spelled out more and the premises argued for.)
Yes.
Yes? I think it’s particularly important for reasons discussed in the “speculation” section, and because it seems to be in our immediate future and indeed our present. Basically, persuasion tools make ideologies (:= a particular kind of memeplex) stronger and stickier, and they change the landscape so that the ideologies that control the tech platforms have a significant advantage.
Has education increased much recently? Not in a way that’s made us significantly more rational as a group, as far as I can tell. Changes in the US education system over the last 20 years presumably made some difference, but they haven’t exactly put us on a bright path towards rational discussion of important issues. My guess is that the effect size is swamped by larger effects from the Internet.
I agree that way of thinking about it seems useful and worthwhile. Are you also implying that thinking specifically about the effects of persuasion tools is not so useful or worthwhile?
I should say btw that you’ve been talking about values but I meant to talk about beliefs as well as values. Memes, in general. Beliefs can get feedback from reality more easily and thus hopefully the attractive beliefs are more likely to be good than the attractive values. But even so, there is room to wonder whether the attractive beliefs for a given environment will all be true… so far, for example, plenty of false beliefs seem to be pretty attractive...