Overall, I think I agree with some of the most important high-level claims of the post:
The world would be better if people could more often reach mutually beneficial deals. We would be more likely to handle challenges that arise, including those that threaten extinction (and including challenges posed by AI, alignment and otherwise). It makes sense to talk about “coordination ability” as a critical causal factor in almost any story about x-risk.
The development and deployment of AI may provide opportunities for cooperation to become either easier or harder (e.g. through capability differentials, alignment failures, geopolitical disruption, or distinctive features of artificial minds). So it can be worthwhile to do work in AI targeted at making cooperation easier, even and especially for people focused on reducing extinction risks.
I also read the post as also implying or suggesting some things I’d disagree with:
That there is some real sense in which “cooperation itself is the problem.” I basically think all of the failure stories will involve some other problem that we would like to cooperate to solve, and we can discuss how well humanity cooperates to solve it (and compare “improve cooperation” to “work directly on the problem” as interventions). In particular, I think the stories in this post would basically be resolved if singles-single alignment worked well, and that taking the stories in this post seriously suggests that progress on single-single alignment makes the world better (since evidently people face a tradeoff between single-single alignment and other goals, so that progress on single-single alignment changes what point on that tradeoff curve they will end up at, and since compromising on single-single alignment appears necessary to any of the bad outcomes in this story).
Relatedly, that cooperation plays a qualitatively different role than other kinds of cognitive enhancement or institutional improvement. I think that both cooperative improvements and cognitive enhancement operate by improving people’s ability to confront problems, and both of them have the downside that they also accelerate the arrival of many of our future problems (most of which are driven by human activity). My current sense is that cooperation has a better tradeoff than some forms of enhancement (e.g. giving humans bigger brains) and worse than others (e.g. improving the accuracy of people’s and institution’s beliefs about the world).
That the nature of the coordination problem for AI systems is qualitatively different from the problem for humans, or somehow is tied up with existential risk from AI in a distinctive way. I think that the coordination problem amongst reasonably-aligned AI systems is very similar to coordination problems amongst humans, and that interventions that improve coordination amongst existing humans and institutions (and research that engages in detail with the nature of existing coordination challenges) are generally more valuable than e.g. work in multi-agent RL or computational social choice.
That this story is consistent with your prior arguments for why single-single alignment has low (or even negative) value. For example, in this comment you wrote “reliability is a strongly dominant factor decisions in deploying real-world technology, such that to me it feels roughly-correctly to treat it as the only factor.” But in this story people choose to adopt technologies that are less robustly aligned because they lead to more capabilities. This tradeoff has real costs even for the person deploying the AI (who is ultimately no longer able to actually receive any profits at all from the firms in which they are nominally a shareholder). So to me your story seems inconsistent with that position and with your prior argument. (Though I don’t actually disagree with the framing in this story, and I may simply not understand your prior position.)
Thanks for this synopsis of your impressions, and +1 to the two points you think we agree on.
I also read the post as also implying or suggesting some things I’d disagree with:
As for these, some of them are real positions I hold, while some are not:
That there is some real sense in which “cooperation itself is the problem.”
I don’t hold that view. I the closest view I hold is more like: “Failing to cooperate on alignment is the problem, and solving it involves being both good at cooperation and good at alignment.”
Relatedly, that cooperation plays a qualitatively different role than other kinds of cognitive enhancement or institutional improvement.
I don’t hold the view you attribute to me here, and I agree wholesale with the following position, including your comparisons of cooperation with brain enhancement and improving belief accuracy:
I think that both cooperative improvements and cognitive enhancement operate by improving people’s ability to confront problems, and both of them have the downside that they also accelerate the arrival of many of our future problems (most of which are driven by human activity). My current sense is that cooperation has a better tradeoff than some forms of enhancement (e.g. giving humans bigger brains) and worse than others (e.g. improving the accuracy of people’s and institution’s beliefs about the world).
… with one caveat: some beliefs are self-fulfilling, such as cooperation/defection. There are ways of improving belief accuracy that favor defection, and ways that favor cooperation. Plausibly to me, the ways of improving belief accuracy that favor defection are worse that mo accuracy improvement at all. I’m particularly firm in this view, though; it’s more of a hedge.
That the nature of the coordination problem for AI systems is qualitatively different from the problem for humans, or somehow is tied up with existential risk from AI in a distinctive way. I think that the coordination problem amongst reasonably-aligned AI systems is very similar to coordination problems amongst humans, and that interventions that improve coordination amongst existing humans and institutions (and research that engages in detail with the nature of existing coordination challenges) are generally more valuable than e.g. work in multi-agent RL or computational social choice.
I do hold this view! Particularly the bolded part. I also agree with the bolded parts of your counterpoint, but I think you might be underestimating the value of technical work (e.g., CSC, MARL) directed at improving coordination amongst existing humans and human institutions.
I think blockchain tech is a good example of an already-mildly-transformative technology for implementing radically mutually transparent and cooperative strategies through smart contracts. Make no mistake: I’m not claiming blockchain tehc is going to “save the world”; rather, it’s changing the way people cooperate, and is doing so as a result of a technical insight. I think more technical insights are in order to improve cooperation and/or the global structure of society, and it’s worth spending research efforts to find them.
Reminder: this is not a bid for you personally to quit working on alignment!
That this story is consistent with your prior arguments for why single-single alignment has low (or even negative) value. For example, in this comment you wrote “reliability is a strongly dominant factor in decisions in deploying real-world technology, such that to me it feels roughly-correctly to treat it as the only factor.” But in this story people choose to adopt technologies that are less robustly aligned because they lead to more capabilities. This tradeoff has real costs even for the person deploying the AI (who is ultimately no longer able to actually receive any profits at all from the firms in which they are nominally a shareholder). So to me your story seems inconsistent with that position and with your prior argument. (Though I don’t actually disagree with the framing in this story, and I may simply not understand your prior position.)
My prior (and present) position is that reliability meeting a certain threshold, rather than being optimized, is a dominant factor in how soon deployment happens. In practice, I think the threshold by default will not be “Reliable enough to partake in a globally cooperative technosphere that preserves human existence”, but rather, “Reliable enough to optimize unilaterally for the benefits of the stakeholders of each system, i.e., to maintain or increase each stakeholder’s competitive advantage.” With that threshold, there easily arises a RAAP racing to the bottom on how much human control/safety/existence is left in the global economy. I think both purely-human interventions (e.g., talking with governments) and sociotechnical interventions (e.g., inventing cooperation-promoting tech) can improve that situation. This is not to say “cooperation is all you need”, any more than I than I would say “alignment is all you need”.
Failing to cooperate on alignment is the problem, and solving it involves being both good at cooperation and good at alignment
Sounds like we are on broadly the same page. I would have said “Aligning ML systems is more likely if we understand more about how to align ML systems, or are better at coordinating to differentially deploy aligned systems, or are wiser or smarter or...” and then moved on to talking about how alignment research quantitatively compares to improvements in various kinds of coordination or wisdom or whatever. (My bottom line from doing this exercise is that I feel more general capabilities typically look less cost-effective on alignment in particular, but benefit a ton from the diversity of problems they help address.)
My prior (and present) position is that reliability meeting a certain threshold, rather than being optimized, is a dominant factor in how soon deployment happens.
I don’t think we can get to convergence on many of these discussions, so I’m happy to just leave it here for the reader to think through.
Reminder: this is not a bid for you personally to quit working on alignment!
I’m reading this (and your prior post) as bids for junior researchers to shift what they focus on. My hope is that seeing the back-and-forth in the comments will, in expectation, help them decide better.
> My prior (and present) position is that reliability meeting a certain threshold, rather than being optimized, is a dominant factor in how soon deployment happens.
I don’t think we can get to convergence on many of these discussions, so I’m happy to just leave it here for the reader to think through.
Yeah I agree we probably can’t reach convergence on how alignment affects deployment time, at least not in this medium (especially since a lot of info about company policies / plans / standards are covered under NDAs), so I also think it’s good to leave this question about deployment-time as a hanging disagreement node.
I’m reading this (and your prior post) as bids for junior researchers to shift what they focus on. My hope is that seeing the back-and-forth in the comments will, in expectation, help them decide better.
Yes to both points; I’d thought of writing a debate dialogue on this topic trying to cover both sides, but commenting with you about it is turning out better I think, so thank for that!
Overall, I think I agree with some of the most important high-level claims of the post:
The world would be better if people could more often reach mutually beneficial deals. We would be more likely to handle challenges that arise, including those that threaten extinction (and including challenges posed by AI, alignment and otherwise). It makes sense to talk about “coordination ability” as a critical causal factor in almost any story about x-risk.
The development and deployment of AI may provide opportunities for cooperation to become either easier or harder (e.g. through capability differentials, alignment failures, geopolitical disruption, or distinctive features of artificial minds). So it can be worthwhile to do work in AI targeted at making cooperation easier, even and especially for people focused on reducing extinction risks.
I also read the post as also implying or suggesting some things I’d disagree with:
That there is some real sense in which “cooperation itself is the problem.” I basically think all of the failure stories will involve some other problem that we would like to cooperate to solve, and we can discuss how well humanity cooperates to solve it (and compare “improve cooperation” to “work directly on the problem” as interventions). In particular, I think the stories in this post would basically be resolved if singles-single alignment worked well, and that taking the stories in this post seriously suggests that progress on single-single alignment makes the world better (since evidently people face a tradeoff between single-single alignment and other goals, so that progress on single-single alignment changes what point on that tradeoff curve they will end up at, and since compromising on single-single alignment appears necessary to any of the bad outcomes in this story).
Relatedly, that cooperation plays a qualitatively different role than other kinds of cognitive enhancement or institutional improvement. I think that both cooperative improvements and cognitive enhancement operate by improving people’s ability to confront problems, and both of them have the downside that they also accelerate the arrival of many of our future problems (most of which are driven by human activity). My current sense is that cooperation has a better tradeoff than some forms of enhancement (e.g. giving humans bigger brains) and worse than others (e.g. improving the accuracy of people’s and institution’s beliefs about the world).
That the nature of the coordination problem for AI systems is qualitatively different from the problem for humans, or somehow is tied up with existential risk from AI in a distinctive way. I think that the coordination problem amongst reasonably-aligned AI systems is very similar to coordination problems amongst humans, and that interventions that improve coordination amongst existing humans and institutions (and research that engages in detail with the nature of existing coordination challenges) are generally more valuable than e.g. work in multi-agent RL or computational social choice.
That this story is consistent with your prior arguments for why single-single alignment has low (or even negative) value. For example, in this comment you wrote “reliability is a strongly dominant factor decisions in deploying real-world technology, such that to me it feels roughly-correctly to treat it as the only factor.” But in this story people choose to adopt technologies that are less robustly aligned because they lead to more capabilities. This tradeoff has real costs even for the person deploying the AI (who is ultimately no longer able to actually receive any profits at all from the firms in which they are nominally a shareholder). So to me your story seems inconsistent with that position and with your prior argument. (Though I don’t actually disagree with the framing in this story, and I may simply not understand your prior position.)
Thanks for this synopsis of your impressions, and +1 to the two points you think we agree on.
As for these, some of them are real positions I hold, while some are not:
I don’t hold that view. I the closest view I hold is more like: “Failing to cooperate on alignment is the problem, and solving it involves being both good at cooperation and good at alignment.”
I don’t hold the view you attribute to me here, and I agree wholesale with the following position, including your comparisons of cooperation with brain enhancement and improving belief accuracy:
… with one caveat: some beliefs are self-fulfilling, such as cooperation/defection. There are ways of improving belief accuracy that favor defection, and ways that favor cooperation. Plausibly to me, the ways of improving belief accuracy that favor defection are worse that mo accuracy improvement at all. I’m particularly firm in this view, though; it’s more of a hedge.
I do hold this view! Particularly the bolded part. I also agree with the bolded parts of your counterpoint, but I think you might be underestimating the value of technical work (e.g., CSC, MARL) directed at improving coordination amongst existing humans and human institutions.
I think blockchain tech is a good example of an already-mildly-transformative technology for implementing radically mutually transparent and cooperative strategies through smart contracts. Make no mistake: I’m not claiming blockchain tehc is going to “save the world”; rather, it’s changing the way people cooperate, and is doing so as a result of a technical insight. I think more technical insights are in order to improve cooperation and/or the global structure of society, and it’s worth spending research efforts to find them.
Reminder: this is not a bid for you personally to quit working on alignment!
My prior (and present) position is that reliability meeting a certain threshold, rather than being optimized, is a dominant factor in how soon deployment happens. In practice, I think the threshold by default will not be “Reliable enough to partake in a globally cooperative technosphere that preserves human existence”, but rather, “Reliable enough to optimize unilaterally for the benefits of the stakeholders of each system, i.e., to maintain or increase each stakeholder’s competitive advantage.” With that threshold, there easily arises a RAAP racing to the bottom on how much human control/safety/existence is left in the global economy. I think both purely-human interventions (e.g., talking with governments) and sociotechnical interventions (e.g., inventing cooperation-promoting tech) can improve that situation. This is not to say “cooperation is all you need”, any more than I than I would say “alignment is all you need”.
Sounds like we are on broadly the same page. I would have said “Aligning ML systems is more likely if we understand more about how to align ML systems, or are better at coordinating to differentially deploy aligned systems, or are wiser or smarter or...” and then moved on to talking about how alignment research quantitatively compares to improvements in various kinds of coordination or wisdom or whatever. (My bottom line from doing this exercise is that I feel more general capabilities typically look less cost-effective on alignment in particular, but benefit a ton from the diversity of problems they help address.)
I don’t think we can get to convergence on many of these discussions, so I’m happy to just leave it here for the reader to think through.
I’m reading this (and your prior post) as bids for junior researchers to shift what they focus on. My hope is that seeing the back-and-forth in the comments will, in expectation, help them decide better.
Yeah I agree we probably can’t reach convergence on how alignment affects deployment time, at least not in this medium (especially since a lot of info about company policies / plans / standards are covered under NDAs), so I also think it’s good to leave this question about deployment-time as a hanging disagreement node.
Yes to both points; I’d thought of writing a debate dialogue on this topic trying to cover both sides, but commenting with you about it is turning out better I think, so thank for that!