paulfchristiano comments on Some AI research areas and their relevance to existential safety

paulfchristiano 20 Nov 2020 4:49 UTC
LW: 38 AF: 21
AF
If single/single alignment is solved it feels like there are some salient “default” ways in which we’ll end up approaching multi/multi alignment:
- Existing single/single alignment techniques can also be applied to empower an organization rather than an individual. So we can use existing social technology to form firms and governments and so on, and those organizations will use AI.
- AI systems can themselves participate in traditional social institutions. So AI systems that represent individual human interests can interact with each other e.g. in markets or democracies.
I totally agree that there are many important problems in the world even if we can align AI. That said, I remain interested in more clarity on what you see as the biggest risks with these multi/multi approaches that could be addressed with technical research.
For example, let’s take the considerations you discuss under CSC:
Third, unless humanity collectively works very hard to maintain a degree of simplicity and legibility in the overall structure of society*, this “alignment revolution” will greatly complexify our environment to a point of much greater incomprehensibility and illegibility than even today’s world. This, in turn, will impoverish humanity’s collective ability to keep abreast of important international developments, as well as our ability to hold the international economy accountable for maintaining our happiness and existence.
One approach to this problem is to work to make it more likely that AI systems can adequately represent human interests in understanding and intervening on the structure of society. But this seems to be a single/single alignment problem (to whatever extent that existing humans currently try to maintain and influence our social structure, such that impairing their ability to do so is problematic at all) which you aren’t excited about.
Fourth, in such a world, algorithms will be needed to hold the aggregate global behavior of algorithms accountable to human wellbeing, because things will be happening too quickly for humans to monitor. In short, an “algorithmic government” will be needed to govern “algorithmic society”. Some might argue this is not strictly unnecessary: in the absence of a mathematically codified algorithmic social contract, humans could in principle coordinate to cease or slow down the use of these powerful new alignment technologies, in order to give ourselves more time to adjust to and govern their use. However, for all our successes in innovating laws and governments, I do not believe current human legal norms are quite developed enough to stably manage a global economy empowered with individually-alignable transformative AI capabilities.
Again, it’s not clear what you expect to happen when existing institutions are empowered by AI and mostly coordinate the activities of AI.
The last line reads to me like “If we were smarter, when our legal system may no longer be up to the challenge,” with which I agree. But it seems like the main remedy is “if we were smarter, we would hopefully work on improving our legal system in tandem with the increasing demands we impose on it.”
It feels like the salient actions to take to me are (i) make direct improvements in the relevant institutions, in a way that anticipates the changes brought about by AI but will most likely not look like AI research, (ii) work on improving the relative capability of AI at those tasks that seem more useful for guiding society in a positive direction.
I consider (ii) to be one of the most important kinds of research other than alignment for improving the impact of AI, and I consider (i) to be all-around one of the most important things to do for making the world better. Neither of them feels much like CSC (e.g. I don’t think computer scientists are the best people to do them) and it’s surprising to me that we end up at such different places (if only in framing and tone) from what seem like similar starting points.
What links here?
- When does technical work to reduce AGI conflict make a difference?: Introduction by JesseClifton (14 Sep 2022 19:38 UTC; 52 points)
- Mapping the Conceptual Territory in AI Existential Safety and Alignment by jbkjr (12 Feb 2021 7:55 UTC; 15 points)
- Andrew_Critch 1 Apr 2021 0:27 UTC
  LW: 10 AF: 6
  AF Parent
  > Third, unless humanity collectively works very hard to maintain a degree of simplicity and legibility in the overall structure of society*, this “alignment revolution” will greatly complexify our environment to a point of much greater incomprehensibility and illegibility than even today’s world. This, in turn, will impoverish humanity’s collective ability to keep abreast of important international developments, as well as our ability to hold the international economy accountable for maintaining our happiness and existence.
  One approach to this problem is to work to make it more likely that AI systems can adequately represent human interests in understanding and intervening on the structure of society. But this seems to be a single/single alignment problem (to whatever extent that existing humans currently try to maintain and influence our social structure, such that impairing their ability to do so is problematic at all) which you aren’t excited about.
  Yes, you’ve correctly anticipated my view on this. Thanks for the very thoughtful reading!
  To elaborate: I claim “turning up the volume” on everyone’s individual agency (by augmenting them with user-aligned systems) does not automatically make society overall healthier and better able to survive, and in fact it might just hasten progress toward an unhealthy or destructive outcome. To me, the way to avoid this is not to make the aligned systems even more aligned with their users, but to start “aligning” them with the rest of society. “Aligning” with society doesn’t just mean “serving” society, it means “fitting into it”, which means the AI system needs to have a particular structure (not just a particular optimization objective) that makes it able to exist and function safely inside a larger society. The desired structure involves features like being transparent, legibly beneficial, and legibly fair. Without those aspects, I think your AI system introduces a bunch of political instability and competitive pressure into the world (e.g., fighting over disagreements about what it’s doing or whether it’s fair or whether it will be good), which I think by default turns up the knob on x-risk rather than turning it down. For a few stories somewhat-resembling this claim, see my next post:
  https://www.alignmentforum.org/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic
  Of course, if you make a super-aligned self-modifying AI, it might immediately self-modify so that its structure is more legibly beneficial and fair, because of the necessity (if I’m correct) of having that structure for benefitting society and therefore its creators/users. However, my preferred approach to building societally-compatible AI is not to make societally-incompatible AI systems and hope that they know their users “want” them to transform into more societally-compatible systems. I think we should build highly societally-compatible systems to begin with, not just because it seems broadly “healthier”, but because I think it’s necessary for getting existential risk down to tolerable levels like <3% or <1%. Moreover, because this view seems misunderstood by x-safety enthusiasts, I currently put the plurality of my existential-failure probability on outcomes arising from problems other than individual systems being misaligned (in terms of the objective) with the users or creators. Dafoe et al would call this “structural risk”, which I find to be a helpful framing that should be applied not only to the structure of society external to the AI system, but also the system’s internal structure.
- Sammy Martin 20 Nov 2020 18:22 UTC
  LW: 4 AF: 1
  AF Parent
  That said, I remain interested in more clarity on what you see as the biggest risks with these multi/multi approaches that could be addressed with technical research.
  A (though not necessarily the most important) reason to think technical research into computational social choice might be useful is that examining specifically the behaviour of RL agents from a computational social choice perspective might alert us to ways in which coordination with future TAI might be similar or different to the existing coordination problems we face.
  (i) make direct improvements in the relevant institutions, in a way that anticipates the changes brought about by AI but will most likely not look like AI research,
  It seems premature to say, in advance of actually seeing what such research uncovers, whether the relevant mechanisms and governance improvements are exactly the same as the improvements we need for good governance generally, or different. Suppose examining the behaviour of current RL agents in social dilemmas leads to a general result which in turn leads us to conclude there’s a disproportionate chance TAI in the future will coordinate in some damaging way that we can resolve with a particular new regulation. It’s always possible to say, solving the single/single alignment problem will prevent anything like that from happening in the first place, but why put all your hopes on plan A, when plan B is relatively neglected?
  - paulfchristiano 21 Nov 2020 1:25 UTC
    LW: 13 AF: 10
    AF Parent
    It’s always possible to say, solving the single/single alignment problem will prevent anything like that from happening in the first place, but why put all your hopes on plan A, when plan B is relatively neglected?
    The OP writes “contributions to AI alignment are also generally unhelpful to existential safety.” I don’t think I’m taking a strong stand in favor of putting all our hopes on plan A, I’m trying to understand the perspective on which plan B is much more important even before considering neglectedness.
    It seems premature to say, in advance of actually seeing what such research uncovers, whether the relevant mechanisms and governance improvements are exactly the same as the improvements we need for good governance generally, or different.
    I agree that would be premature. That said, I still found it notable that OP saw such a large gap between the importance of CSC and other areas on and off the list (including MARL). Given that I would have these things in a different order (before having thought deeply), it seemed to illustrate a striking difference in perspective. I’m not really trying to take a strong stand, just using it to illustrate and explore that difference in perspective.
    What links here?
    Sammy Martin's comment on Eight claims about multi-agent AGI safety by Richard_Ngo (7 Jan 2021 19:48 UTC; 4 points)