> Third, unless humanity collectively works very hard to maintain a degree of simplicity and legibility in the overall structure of society*, this “alignment revolution” will greatly complexify our environment to a point of much greater incomprehensibility and illegibility than even today’s world. This, in turn, will impoverish humanity’s collective ability to keep abreast of important international developments, as well as our ability to hold the international economy accountable for maintaining our happiness and existence.
One approach to this problem is to work to make it more likely that AI systems can adequately represent human interests in understanding and intervening on the structure of society. But this seems to be a single/single alignment problem (to whatever extent that existing humans currently try to maintain and influence our social structure, such that impairing their ability to do so is problematic at all) which you aren’t excited about.
Yes, you’ve correctly anticipated my view on this. Thanks for the very thoughtful reading!
To elaborate: I claim “turning up the volume” on everyone’s individual agency (by augmenting them with user-aligned systems) does not automatically make society overall healthier and better able to survive, and in fact it might just hasten progress toward an unhealthy or destructive outcome. To me, the way to avoid this is not to make the aligned systems even more aligned with their users, but to start “aligning” them with the rest of society. “Aligning” with society doesn’t just mean “serving” society, it means “fitting into it”, which means the AI system needs to have a particular structure (not just a particular optimization objective) that makes it able to exist and function safely inside a larger society. The desired structure involves features like being transparent, legibly beneficial, and legibly fair. Without those aspects, I think your AI system introduces a bunch of political instability and competitive pressure into the world (e.g., fighting over disagreements about what it’s doing or whether it’s fair or whether it will be good), which I think by default turns up the knob on x-risk rather than turning it down. For a few stories somewhat-resembling this claim, see my next post:
Of course, if you make a super-aligned self-modifying AI, it might immediately self-modify so that its structure is more legibly beneficial and fair, because of the necessity (if I’m correct) of having that structure for benefitting society and therefore its creators/users. However, my preferred approach to building societally-compatible AI is not to make societally-incompatible AI systems and hope that they know their users “want” them to transform into more societally-compatible systems. I think we should build highly societally-compatible systems to begin with, not just because it seems broadly “healthier”, but because I think it’s necessary for getting existential risk down to tolerable levels like <3% or <1%. Moreover, because this view seems misunderstood by x-safety enthusiasts, I currently put the plurality of my existential-failure probability on outcomes arising from problems other than individual systems being misaligned (in terms of the objective) with the users or creators. Dafoe et al would call this “structural risk”, which I find to be a helpful framing that should be applied not only to the structure of society external to the AI system, but also the system’s internal structure.
Yes, you’ve correctly anticipated my view on this. Thanks for the very thoughtful reading!
To elaborate: I claim “turning up the volume” on everyone’s individual agency (by augmenting them with user-aligned systems) does not automatically make society overall healthier and better able to survive, and in fact it might just hasten progress toward an unhealthy or destructive outcome. To me, the way to avoid this is not to make the aligned systems even more aligned with their users, but to start “aligning” them with the rest of society. “Aligning” with society doesn’t just mean “serving” society, it means “fitting into it”, which means the AI system needs to have a particular structure (not just a particular optimization objective) that makes it able to exist and function safely inside a larger society. The desired structure involves features like being transparent, legibly beneficial, and legibly fair. Without those aspects, I think your AI system introduces a bunch of political instability and competitive pressure into the world (e.g., fighting over disagreements about what it’s doing or whether it’s fair or whether it will be good), which I think by default turns up the knob on x-risk rather than turning it down. For a few stories somewhat-resembling this claim, see my next post:
https://www.alignmentforum.org/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic
Of course, if you make a super-aligned self-modifying AI, it might immediately self-modify so that its structure is more legibly beneficial and fair, because of the necessity (if I’m correct) of having that structure for benefitting society and therefore its creators/users. However, my preferred approach to building societally-compatible AI is not to make societally-incompatible AI systems and hope that they know their users “want” them to transform into more societally-compatible systems. I think we should build highly societally-compatible systems to begin with, not just because it seems broadly “healthier”, but because I think it’s necessary for getting existential risk down to tolerable levels like <3% or <1%. Moreover, because this view seems misunderstood by x-safety enthusiasts, I currently put the plurality of my existential-failure probability on outcomes arising from problems other than individual systems being misaligned (in terms of the objective) with the users or creators. Dafoe et al would call this “structural risk”, which I find to be a helpful framing that should be applied not only to the structure of society external to the AI system, but also the system’s internal structure.