I reasonably often find myself grateful that some dysfunctional norms or epistemic practices will most likely become obsolete. It’s a bit scary to think about a world where the only solution is waiting for someone to snap out of it.
I’ve been thinking a lot about this lately, so I’m glad to see that it’s on your mind too, although I think I may still be a bit more concerned about it than you are. Couple of thoughts:
What if our “deliberation” only made it as far as it did because of “competition”, and that nobody or very few people knows how to deliberate correctly in the absence of competitive pressures? Basically, our current epistemic norms/practices came from the European Enlightenment, and they were spread largely via conquest or people adopting them to avoid being conquered or to compete in terms of living standards, etc. It seems that in the absence of strong competitive pressures of a certain kind, societies can quickly backslide or drift randomly in terms of epistemic norms/practices, and we don’t know how to prevent this.
What’s your expectation of the fraction of total potential value that will be lost due to people failing to deliberate correctly (e.g., failing to ever “snap out of it”, or getting “persuaded” by bad memes and then asking their AIs to lock in their beliefs/values)? It seems to me that it’s very large, easily >50%. I’m curious how others would answer this question as well.
Alice and Bob can try to have an agreement to avoid racing ahead or engaging in some kinds of manipulation, and analogous a broader society could adopt such norms or divide into communities with internal agreements of this form.
In a sane civilization, tons of people would already be studying how to make and enforce such agreements, e.g., how to define what kinds of behaviors count as “manipulation”, and more generally what are good epistemic norms/practices and how to ensure that many people adopt such norms/practices. If this problem is solved, then maybe we don’t need to solve metaphilosophy (in the technical or algorithmic sense), as far as preventing astronomical waste arising from bad deliberation. Unfortunately it seems there’s approximately zero people working on either problem.
I would rate “value lost to bad deliberation” (“deliberation” broadly construed, and including easy+hard problems and individual+collective failures) as comparably important to “AI alignment.” But I’d guess the total amount of investment in the problem is 1-2 orders of magnitude lower, so there is a strong prima facie case for longtermists prioritizing it.
Overall I think I’m quite a bit more optimistic than you are, and would prioritize these problems less than you would, but still agree directionally that these problems are surprisingly neglected (and I could imagine them playing more to the comparative advantages/interests of longermists and the LW crowd than topics like AI alignment).
What if our “deliberation” only made it as far as it did because of “competition”, and that nobody or very few people knows how to deliberate correctly in the absence of competitive pressures? Basically, our current epistemic norms/practices came from the European Enlightenment, and they were spread largely via conquest or people adopting them to avoid being conquered or to compete in terms of living standards, etc. It seems that in the absence of strong competitive pressures of a certain kind, societies can quickly backslide or drift randomly in terms of epistemic norms/practices, and we don’t know how to prevent this.
This seems like a quantitative difference, basically the same as your question 2. “A few people might mess up and it’s good that competition weeds them out” is the rosy view, “most everyone will mess up and it’s good that competition makes progress possible at all” is the pessimistic view (or even further that everyone would mess up and so you need to frequently split groups and continue applying selection).
We’ve talked about this a few times but I still don’t really feel like there’s much empirical support for the kind of permanent backsliding you’re concerned about being widespread. Maybe you think that in a world with secure property rights + high quality of life for everyone (what I have in mind as a prototypical decoupling) the problem would be much worse. E.g. maybe communist china only gets unstuck because of their failure to solve basic problems in physical reality. But I don’t see much evidence for that (and indeed failures of property rights / threats of violence seem to play an essential role in many scenarios with lots of backsliding).
What’s your expectation of the fraction of total potential value that will be lost due to people failing to deliberate correctly (e.g., failing to ever “snap out of it”, or getting “persuaded” by bad memes and then asking their AIs to lock in their beliefs/values)? It seems to me that it’s very large, easily >50%. I’m curious how others would answer this question as well.
There are some fuzzy borders here, and unclarity about how to define the concept, but maybe I’d guess 10% from “easy” failures to deliberate (say those that could be avoided by the wisest existing humans and which might be significantly addressed, perhaps cut in half, by competitive discipline) and a further 10% from “hard” failures (most of which I think would not be addressed by competition).
It seems to me like the main driver of the first 10% risk is the ability to lock in a suboptimal view (rather than a conventional deliberation failure), and so the question is when that becomes possible, what views towards it are like, and so on. This is one of my largest concerns about AI after alignment.
I am most inclined to intervene via “paternalistic” restrictions on some classes of binding commitments that might otherwise be facilitated by AI. (People often talk about this concern in the context of totalitarianism, whereas that seems like a small minority of the risk to me / it’s not really clear whether a totalitarian society is better or worse on this particular axis than a global democracy.)
We’ve talked about this a few times but I still don’t really feel like there’s much empirical support for the kind of permanent backsliding you’re concerned about being widespread.
I’m not claiming direct empirical support for permanent backsliding. That seems hard to come by, given that we can’t see into the far future. I am observing quite severe current backsliding. For example, explicit ad hominem attacks, as well as implicitly weighing people’s ideas/arguments/evidence differently, based on things like the speaker’s race and sex, have become the norm in local policy discussions around these parts. AFAICT, this originated from academia, under “standpoint epistemology” and related ideas.
On the other side of the political spectrum, several people close to me became very sure that “the election was stolen” due to things like hacked Dominion machines and that the military and/or Supreme Court was going to intervene in favor of Trump (to the extent that it was impossible for me to talk them out of these conclusions). One of them, who I had previously thought was smart/sane enough to entrust a great deal of my financial resources with, recently expressed concern for my life because I was going to get the COVID vaccine.
Is this an update for you, or have you already observed such things yourself or otherwise known how bad things have become?
There are some fuzzy borders here, and unclarity about how to define the concept, but maybe I’d guess 10% from “easy” failures to deliberate (say those that could be avoided by the wisest existing humans and which might be significantly addressed, perhaps cut in half, by competitive discipline) and a further 10% from “hard” failures (most of which I think would not be addressed by competition).
Given these numbers, it seems that you’re pretty sure that almost everyone will eventually “snap out of” any bad ideas they get talked into, or they talk themselves into. Why? Is this based on some observations you’ve made that I haven’t seen, or history that you know about that I don’t? Or do you have some idea of a mechanism by which this “snapping out of” happens?
Here’s an idea of how random drift of epistemic norms and practices can occur. Beliefs (including beliefs about normative epistemology) function in part as a signaling device, similar to clothes. (I forgot where I came across this idea originally, but a search produced a Robin Hanson article about it.) The social dynamics around this kind of signaling produces random drift in epistemic norms and practices, similar to random drift in fashion / clothing styles. Such drift coupled with certain kinds of competition could have produced the world we have today (i.e., certain groups happened upon especially effective norms/practices by chance and then spread their influence through competition), but may lead to disaster in the future in the absence of competition, as it’s unclear what will then counteract future drift that will cause continued deterioration in epistemic conditions.
Another mechanism for random drift is technological change that disrupts previous epistemic norms/practices without anyone specifically intending to. I think we’ve seen this recently too, in the form of, e.g., cable news and social media. It seems like you’re envisioning that future humans will deliberately isolate their deliberation from technological advances (until they’re ready to incorporate those advances into how they deliberate), so in that scenario perhaps this form of drift will stop at some point, but (1) it’s unclear how many people will actually decide to do that, and (2) even in that scenario there will still be a large amount of drift between the recent past (when epistemic conditions still seemed reasonably ok, although I had my doubts even back then), which (together with other forms of drift) might never be recovered from.
As another symptom what’s happening (the rest of this comment is in a “paste” that will expire in about a month, to reduce the risk of it being used against me in the future)
I’ve been thinking a lot about this lately, so I’m glad to see that it’s on your mind too, although I think I may still be a bit more concerned about it than you are. Couple of thoughts:
What if our “deliberation” only made it as far as it did because of “competition”, and that nobody or very few people knows how to deliberate correctly in the absence of competitive pressures? Basically, our current epistemic norms/practices came from the European Enlightenment, and they were spread largely via conquest or people adopting them to avoid being conquered or to compete in terms of living standards, etc. It seems that in the absence of strong competitive pressures of a certain kind, societies can quickly backslide or drift randomly in terms of epistemic norms/practices, and we don’t know how to prevent this.
What’s your expectation of the fraction of total potential value that will be lost due to people failing to deliberate correctly (e.g., failing to ever “snap out of it”, or getting “persuaded” by bad memes and then asking their AIs to lock in their beliefs/values)? It seems to me that it’s very large, easily >50%. I’m curious how others would answer this question as well.
In a sane civilization, tons of people would already be studying how to make and enforce such agreements, e.g., how to define what kinds of behaviors count as “manipulation”, and more generally what are good epistemic norms/practices and how to ensure that many people adopt such norms/practices. If this problem is solved, then maybe we don’t need to solve metaphilosophy (in the technical or algorithmic sense), as far as preventing astronomical waste arising from bad deliberation. Unfortunately it seems there’s approximately zero people working on either problem.
I would rate “value lost to bad deliberation” (“deliberation” broadly construed, and including easy+hard problems and individual+collective failures) as comparably important to “AI alignment.” But I’d guess the total amount of investment in the problem is 1-2 orders of magnitude lower, so there is a strong prima facie case for longtermists prioritizing it.
Overall I think I’m quite a bit more optimistic than you are, and would prioritize these problems less than you would, but still agree directionally that these problems are surprisingly neglected (and I could imagine them playing more to the comparative advantages/interests of longermists and the LW crowd than topics like AI alignment).
This seems like a quantitative difference, basically the same as your question 2. “A few people might mess up and it’s good that competition weeds them out” is the rosy view, “most everyone will mess up and it’s good that competition makes progress possible at all” is the pessimistic view (or even further that everyone would mess up and so you need to frequently split groups and continue applying selection).
We’ve talked about this a few times but I still don’t really feel like there’s much empirical support for the kind of permanent backsliding you’re concerned about being widespread. Maybe you think that in a world with secure property rights + high quality of life for everyone (what I have in mind as a prototypical decoupling) the problem would be much worse. E.g. maybe communist china only gets unstuck because of their failure to solve basic problems in physical reality. But I don’t see much evidence for that (and indeed failures of property rights / threats of violence seem to play an essential role in many scenarios with lots of backsliding).
There are some fuzzy borders here, and unclarity about how to define the concept, but maybe I’d guess 10% from “easy” failures to deliberate (say those that could be avoided by the wisest existing humans and which might be significantly addressed, perhaps cut in half, by competitive discipline) and a further 10% from “hard” failures (most of which I think would not be addressed by competition).
It seems to me like the main driver of the first 10% risk is the ability to lock in a suboptimal view (rather than a conventional deliberation failure), and so the question is when that becomes possible, what views towards it are like, and so on. This is one of my largest concerns about AI after alignment.
I am most inclined to intervene via “paternalistic” restrictions on some classes of binding commitments that might otherwise be facilitated by AI. (People often talk about this concern in the context of totalitarianism, whereas that seems like a small minority of the risk to me / it’s not really clear whether a totalitarian society is better or worse on this particular axis than a global democracy.)
I’m not claiming direct empirical support for permanent backsliding. That seems hard to come by, given that we can’t see into the far future. I am observing quite severe current backsliding. For example, explicit ad hominem attacks, as well as implicitly weighing people’s ideas/arguments/evidence differently, based on things like the speaker’s race and sex, have become the norm in local policy discussions around these parts. AFAICT, this originated from academia, under “standpoint epistemology” and related ideas.
On the other side of the political spectrum, several people close to me became very sure that “the election was stolen” due to things like hacked Dominion machines and that the military and/or Supreme Court was going to intervene in favor of Trump (to the extent that it was impossible for me to talk them out of these conclusions). One of them, who I had previously thought was smart/sane enough to entrust a great deal of my financial resources with, recently expressed concern for my life because I was going to get the COVID vaccine.
Is this an update for you, or have you already observed such things yourself or otherwise known how bad things have become?
Given these numbers, it seems that you’re pretty sure that almost everyone will eventually “snap out of” any bad ideas they get talked into, or they talk themselves into. Why? Is this based on some observations you’ve made that I haven’t seen, or history that you know about that I don’t? Or do you have some idea of a mechanism by which this “snapping out of” happens?
Here’s an idea of how random drift of epistemic norms and practices can occur. Beliefs (including beliefs about normative epistemology) function in part as a signaling device, similar to clothes. (I forgot where I came across this idea originally, but a search produced a Robin Hanson article about it.) The social dynamics around this kind of signaling produces random drift in epistemic norms and practices, similar to random drift in fashion / clothing styles. Such drift coupled with certain kinds of competition could have produced the world we have today (i.e., certain groups happened upon especially effective norms/practices by chance and then spread their influence through competition), but may lead to disaster in the future in the absence of competition, as it’s unclear what will then counteract future drift that will cause continued deterioration in epistemic conditions.
Another mechanism for random drift is technological change that disrupts previous epistemic norms/practices without anyone specifically intending to. I think we’ve seen this recently too, in the form of, e.g., cable news and social media. It seems like you’re envisioning that future humans will deliberately isolate their deliberation from technological advances (until they’re ready to incorporate those advances into how they deliberate), so in that scenario perhaps this form of drift will stop at some point, but (1) it’s unclear how many people will actually decide to do that, and (2) even in that scenario there will still be a large amount of drift between the recent past (when epistemic conditions still seemed reasonably ok, although I had my doubts even back then), which (together with other forms of drift) might never be recovered from.
As another symptom what’s happening (the rest of this comment is in a “paste” that will expire in about a month, to reduce the risk of it being used against me in the future)