I feel like you are lumping together things like “bargaining in a world with many AIs representing diverse stakeholders” with things like “prioritizing actions on the basis of how they affect the balance of power.”
Yes, but not a crux for my point. I think this community has a blind/blurry spot around both of those things (compared to the most influential elites shaping the future of humanity). So, the thesis statement of the post does not hinge on this distinction, IMO.
I would prefer keep those things separate.
Yep, and in fact, for actually diagnosing the structure of the problem, I would prefer to subdivide further into three things and not just two:
Thinking about how power dynamics play out in the world;
Thinking about what actions are optimal*, given (1); and
Choosing optimal* actions, given (2) (which, as you say, can involve actions chosen to shift power balances)
My concern is that the causal factors laid out in this post have lead to correctable weaknesses in all three of these areas. I’m more confident in the claim of weakness (relative to skilled thinkers in this area, not average thinkers) than any particular causal story for how the weakness formed. But, having drawn out the distinction among 1-3, I’ll say that the following mechanism is probably at play in a bunch of people:
a) people are really afraid of people manipulated;
b) they see/experience (3) as an instance of being manipulated and therefore have strong fear reactions around it (or disgust, or anger, or anxiety);
c) their avoidance reactions around (3) generalize to avoiding (1) and (2).
d) point (c) compounds with the other causal factors in the OP to lead to too-much-avoidance-of-thought around power/bargaining.
There are of course specific people who are exceptions to this concern. And, I hear you that you do lots of (1) and (2) while choosing actions based on a version of optimal* defined in terms of “win-wins”. (This is not an update for me, because I know that you personally think about bargaining dynamics.)
However, my concern is not about you-specifically. Who exactly is about, you might ask? I’d say “the average lesswrong reader” or “the average AI researcher interested in AI alignment”.
For example, “Politics is the mind-killer” is mostly levied against doing politics, not thinking about politics as someonething that someone else might do (thereby destroy the world).
Yes, but not a crux to the point I’m trying to make. I already noted in the post that PMK is was not trying to make people think about politics; in fact, I included a direct quote to that effect: “The PMK post does not directly advocate for readers to avoid thinking about politics; in fact, it says “I’m not saying that I think we should be apolitical, or even that we should adopt Wikipedia’s ideal of the Neutral Point of View.” ” My concern, as noted in the post, is that the effects of PMK (rather than its intentions) may have been somewhat crippling:
[OP] However, it also begins with the statement “People go funny in the head when talking about politics”. If a person doubts their own ability not to “go funny in the head”, the PMK post — and concerns like it — could lead them to avoid thinking about or engaging with politics as a way of preventing themselves from “going funny”.
I also agree with the following comparison you made to mainstream AI/ML, but it’s also not a crux for the point I’m trying to make:
it seems to me that rationalist and EA community think about AI-AI bargaining and costs from AI-AI competition much more than the typical AI researchers, as measured by e.g. fraction of time spent thinking about those problems, fraction of writing that is about those problems, fraction of stated research priorities that involve those problems, and so on. This is all despite outlier technical beliefs suggesting an unprecedentedly “unipolar” world during the most important parts of AI deployment (which I mostly disagree with).
My concern is not that our community is under-performing the average AI/ML researcher in thinking about the future — as you point out, we are clearly over-performing. Rather, the concern is that we are underperforming the forces that will actually shape the future, which are driven primarily by the most skilled people who are going around shifting the balance of power. Moreover, my read on this community is that it mostly exhibits a disdainful reaction to those skills, both in specific (e.g., if I called out specific people who have them) and in general (if I call them out in the abstract, as I have here).
Here’s a another way of laying out what I think is going on:
A = «anxiety/fear/disgust/anger around being manipulated»
B = «filter-bubbling around early EA narratives designed for institution-building, single, single-stakeholder AI alignment»
C = «commitment to the art-of-rationality/thinking»
S = «skill at thinking about power dynamics»
D = «disdain for people who exhibit S»
Effect 1: A increases C (this is healthy).
Effect 2: C increases S (which is probably why out community it out-performing mainstream AI/ML).
Effect 3: A decreases S in a bunch of people (not all people!), specifically, people who turn anxiety into avoidance-of-the-topic.
Effect 4: Effects 1-3 make it easy for a filter-bubble to form that ignores power dynamics among and around powerful AI systems and counts on single-stakeholder AI alignment to save the world with one big strong power-dynamic instead of a more efficient/nuanced wielding of power.
The solution to this problem is not to decrease C, from which we derive our strength, but to mitigate Effect 3 by getting A to be more calibrated / less triggered. Around AI specifically, this requires holding space in discourse for thinking and writing about power-dynamics in multi-stakeholder scenarios.
Hopefully this correction can be made without significantly harming efforts to diversify highly focussed efforts on alignment, such as yours. E.g., I’m still hoping like 10 people will go work for you at ARC, as soon as you want to expand to that size, in large part because I know you personally think about bargaining dynamics and are mulling them over in the background while you address alignment. [Other readers: please consider this an endorsement to go work for Paul if he wants you on his team!]
X = thinking about the dynamics of conflict + how they affect our collective ability to achieve things we all want; prioritizing actions based on those considerations
Y = thinking about how actions shift the balance of power + how we should be trying to shift the balance of power; prioritizing actions based on those considerations
I’m saying:
I think the alignment community traditionally avoids Y but does a lot of X.
I think that the factors you listed (including in the parent) are mostly reasons we’d do less Y.
So I read you as mostly making a case for “why the alignment community might be inappropriately averse to Y.”
I think that separating X and Y would make this discussion clearer.
I’m personally sympathetic to both activities. I think the altruistic case for marginal X is stronger.
Here are some reasons I perceive you as mostly talking about Y rather than X:
You write: “Rather, the concern is that we are underperforming the forces that will actually shape the future, which are driven primarily by the most skilled people who are going around shifting the balance of power.” This seems like a good description of Y but not X.
You listed “Competitive dynamics as a distraction from alignment.” But in my people from the alignment community very often bring up X themselves both as a topic for research and as a justification for their research (suggesting that in fact they don’t regard it as a distraction), and in my experience Y derails conversations about alignment perhaps 10x more often than X.
You talk about the effects of the PMK post. Explicitly that post is mostly about Y rather than X and it is often brought up when someone starts Y-ing on LW. It may also have the effect of discouraging X, but I don’t think you made the case for that.
You mention the causal link from “fear of being manipulated” to “skill at thinking about power dynamics” which looks very plausible (to me) in the context of Y but looks like kind of a stretch (to me) in the context of X. You say “they find it difficult to think about topics that their friends or co-workers disagree with them about,” which again is most relevant to Y (where people frequently disagree about who should have power or how important it is) and not particularly relevant to X (similar to other technical discussions).
In your first section you quote Eliezer. But he’s not complaining about people thinking about how fights go in a way that might disruptive a sense of shared purpose, he’s complaining that Elon Musk is in fact making their decisions in order to change which group gets power in a way that more obviously disrupts any sense of shared purpose. This seems like complaining about Y, rather than X.
More generally, my sense is that X involves thinking about politics and Y mostly is politics, and most of your arguments describe why people might be averse to doing politics rather than discussing it. Of course that can flow backwards (people who don’t like doing something may also not like talking about it) but there’s certainly a missing link.
Relative to the broader community thinking about beneficial AI, the alignment community does unusually much X and unusually little Y. So prima facie it’s more likely that “too little X+Y” is mostly about “too little Y” rather than “too little X.” Similarly, when you list corrective influences they are about X rather than Y.
I care about this distinction because in my experience discussions about alignment of any kind (outside of this community) are under a lot of social pressure to turn into discussions about Y. In the broader academic/industry community it is becoming harder to resist those pressures.
I’m fine with lots of Y happening, I just really want to defend “get better at alignment” as a separate project that may require substantial investment. I’m concerned that equivocating between X and Y will make this difficulty worse, because many of the important divisions are between (alignment, X) vs (Y) rather than (alignment) vs (X, Y).
Sorry for the slow reply!
Yes, but not a crux for my point. I think this community has a blind/blurry spot around both of those things (compared to the most influential elites shaping the future of humanity). So, the thesis statement of the post does not hinge on this distinction, IMO.
Yep, and in fact, for actually diagnosing the structure of the problem, I would prefer to subdivide further into three things and not just two:
Thinking about how power dynamics play out in the world;
Thinking about what actions are optimal*, given (1); and
Choosing optimal* actions, given (2) (which, as you say, can involve actions chosen to shift power balances)
My concern is that the causal factors laid out in this post have lead to correctable weaknesses in all three of these areas. I’m more confident in the claim of weakness (relative to skilled thinkers in this area, not average thinkers) than any particular causal story for how the weakness formed. But, having drawn out the distinction among 1-3, I’ll say that the following mechanism is probably at play in a bunch of people:
a) people are really afraid of people manipulated;
b) they see/experience (3) as an instance of being manipulated and therefore have strong fear reactions around it (or disgust, or anger, or anxiety);
c) their avoidance reactions around (3) generalize to avoiding (1) and (2).
d) point (c) compounds with the other causal factors in the OP to lead to too-much-avoidance-of-thought around power/bargaining.
There are of course specific people who are exceptions to this concern. And, I hear you that you do lots of (1) and (2) while choosing actions based on a version of optimal* defined in terms of “win-wins”. (This is not an update for me, because I know that you personally think about bargaining dynamics.)
However, my concern is not about you-specifically. Who exactly is about, you might ask? I’d say “the average lesswrong reader” or “the average AI researcher interested in AI alignment”.
Yes, but not a crux to the point I’m trying to make. I already noted in the post that PMK is was not trying to make people think about politics; in fact, I included a direct quote to that effect: “The PMK post does not directly advocate for readers to avoid thinking about politics; in fact, it says “I’m not saying that I think we should be apolitical, or even that we should adopt Wikipedia’s ideal of the Neutral Point of View.” ” My concern, as noted in the post, is that the effects of PMK (rather than its intentions) may have been somewhat crippling:
I also agree with the following comparison you made to mainstream AI/ML, but it’s also not a crux for the point I’m trying to make:
My concern is not that our community is under-performing the average AI/ML researcher in thinking about the future — as you point out, we are clearly over-performing. Rather, the concern is that we are underperforming the forces that will actually shape the future, which are driven primarily by the most skilled people who are going around shifting the balance of power. Moreover, my read on this community is that it mostly exhibits a disdainful reaction to those skills, both in specific (e.g., if I called out specific people who have them) and in general (if I call them out in the abstract, as I have here).
Here’s a another way of laying out what I think is going on:
A = «anxiety/fear/disgust/anger around being manipulated»
B = «filter-bubbling around early EA narratives designed for institution-building, single, single-stakeholder AI alignment»
C = «commitment to the art-of-rationality/thinking»
S = «skill at thinking about power dynamics»
D = «disdain for people who exhibit S»
Effect 1: A increases C (this is healthy).
Effect 2: C increases S (which is probably why out community it out-performing mainstream AI/ML).
Effect 3: A decreases S in a bunch of people (not all people!), specifically, people who turn anxiety into avoidance-of-the-topic.
Effect 4: Effects 1-3 make it easy for a filter-bubble to form that ignores power dynamics among and around powerful AI systems and counts on single-stakeholder AI alignment to save the world with one big strong power-dynamic instead of a more efficient/nuanced wielding of power.
The solution to this problem is not to decrease C, from which we derive our strength, but to mitigate Effect 3 by getting A to be more calibrated / less triggered. Around AI specifically, this requires holding space in discourse for thinking and writing about power-dynamics in multi-stakeholder scenarios.
Hopefully this correction can be made without significantly harming efforts to diversify highly focussed efforts on alignment, such as yours. E.g., I’m still hoping like 10 people will go work for you at ARC, as soon as you want to expand to that size, in large part because I know you personally think about bargaining dynamics and are mulling them over in the background while you address alignment. [Other readers: please consider this an endorsement to go work for Paul if he wants you on his team!]
Let’s define:
X = thinking about the dynamics of conflict + how they affect our collective ability to achieve things we all want; prioritizing actions based on those considerations
Y = thinking about how actions shift the balance of power + how we should be trying to shift the balance of power; prioritizing actions based on those considerations
I’m saying:
I think the alignment community traditionally avoids Y but does a lot of X.
I think that the factors you listed (including in the parent) are mostly reasons we’d do less Y.
So I read you as mostly making a case for “why the alignment community might be inappropriately averse to Y.”
I think that separating X and Y would make this discussion clearer.
I’m personally sympathetic to both activities. I think the altruistic case for marginal X is stronger.
Here are some reasons I perceive you as mostly talking about Y rather than X:
You write: “Rather, the concern is that we are underperforming the forces that will actually shape the future, which are driven primarily by the most skilled people who are going around shifting the balance of power.” This seems like a good description of Y but not X.
You listed “Competitive dynamics as a distraction from alignment.” But in my people from the alignment community very often bring up X themselves both as a topic for research and as a justification for their research (suggesting that in fact they don’t regard it as a distraction), and in my experience Y derails conversations about alignment perhaps 10x more often than X.
You talk about the effects of the PMK post. Explicitly that post is mostly about Y rather than X and it is often brought up when someone starts Y-ing on LW. It may also have the effect of discouraging X, but I don’t think you made the case for that.
You mention the causal link from “fear of being manipulated” to “skill at thinking about power dynamics” which looks very plausible (to me) in the context of Y but looks like kind of a stretch (to me) in the context of X. You say “they find it difficult to think about topics that their friends or co-workers disagree with them about,” which again is most relevant to Y (where people frequently disagree about who should have power or how important it is) and not particularly relevant to X (similar to other technical discussions).
In your first section you quote Eliezer. But he’s not complaining about people thinking about how fights go in a way that might disruptive a sense of shared purpose, he’s complaining that Elon Musk is in fact making their decisions in order to change which group gets power in a way that more obviously disrupts any sense of shared purpose. This seems like complaining about Y, rather than X.
More generally, my sense is that X involves thinking about politics and Y mostly is politics, and most of your arguments describe why people might be averse to doing politics rather than discussing it. Of course that can flow backwards (people who don’t like doing something may also not like talking about it) but there’s certainly a missing link.
Relative to the broader community thinking about beneficial AI, the alignment community does unusually much X and unusually little Y. So prima facie it’s more likely that “too little X+Y” is mostly about “too little Y” rather than “too little X.” Similarly, when you list corrective influences they are about X rather than Y.
I care about this distinction because in my experience discussions about alignment of any kind (outside of this community) are under a lot of social pressure to turn into discussions about Y. In the broader academic/industry community it is becoming harder to resist those pressures.
I’m fine with lots of Y happening, I just really want to defend “get better at alignment” as a separate project that may require substantial investment. I’m concerned that equivocating between X and Y will make this difficulty worse, because many of the important divisions are between (alignment, X) vs (Y) rather than (alignment) vs (X, Y).
This also summarizes a lot of my take on this position. Thank you!
This felt like a pretty helpful clarification of your thoughts Paul, thx.