Yudkowsky on AGI ethics
A Cornell computer scientist recently wrote on social media:
[...] I think the general sense in AI is that we don’t know what will play out, but some of these possibilities are bad, and we need to start thinking about it. We are plagued by highly visible people ranging from Musk to Ng painting pictures ranging from imminent risk to highly premature needless fear, but that doesn’t depict the center of gravity, which has noticeably shifted to thinking about the potential bad outcomes and what we might do about it. (Turning close to home to provide an example of how mainstream this is becoming, at Cornell two AI professors, Joe Halpern and Bart Selman, ran a seminar and course last semester on societal and ethical challenges for AI, and only just a few weeks ago we had a labor economist speak in our CS colloquium series about policy ideas targeting possible future directions for CS and AI, to an extremely large and enthusiastic audience.)
To which Eliezer Yudkowsky replied:
My forecast of the net effects of “ethical” discussion is negative; I expect it to be a cheap, easy, attention-grabbing distraction from technical issues and technical thoughts that actually determine okay outcomes. [...]
The ethics of bridge-building is to not have your bridge fall down and kill people and there is a frame of mind in which this obviousness is obvious enough. How not to have the bridge fall down is hard.
This is possibly surprising coming from the person who came up with coherent extrapolated volition, co-wrote the Cambridge Handbook of Artificial Intelligence article on “The Ethics of AI,” etc. The relevant background comes from Eliezer’s writing on the minimality principle:
[W]hen we are building the first sufficiently advanced Artificial Intelligence, we are operating in an extremely dangerous context in which building a marginally more powerful AI is marginally more dangerous. The first AGI ever built should therefore execute the least dangerous plan for preventing immediately following AGIs from destroying the world six months later. Furthermore, the least dangerous plan is not the plan that seems to contain the fewest material actions that seem risky in a conventional sense, but rather the plan that requires the least dangerous cognition from the AGI executing it. Similarly, inside the AGI itself, if a class of thought seems dangerous but necessary to execute sometimes, we want to execute the least instances of that class of thought required to accomplish the overall task.
E.g., if we think it’s a dangerous kind of event for the AGI to ask “How can I achieve this end using strategies from across every possible domain?” then we might want a design where most routine operations only search for strategies within a particular domain, and events where the AI searches across all known domains are rarer and visible to the programmers. Processing a goal that can recruit subgoals across every domain would be a dangerous event, albeit a necessary one, and therefore we want to do less of it within the AI.”
So the technical task of figuring out how to build a robust minimal AGI system that’s well-aligned with its operators’ intentions is very different from “AI ethics”; and the tendency to conflate those two has plausibly caused a lot of thought and attention to go into much broader (or much narrower) issues that could have more profitably gone into thinking about the alignment problem.
One part of doing the absolute bare world-saving minimum with a general-purpose reasoning system is steering clear of any strategies that require the system to do significant moral reasoning (or implement less-than-totally-airtight moral views held by its operators). Just execute the most simple and straightforward concrete sequence of actions, requiring the least dangerous varieties and quantity of AGI cognition needed for success.
Another way of putting this view is that nearly all of the effort should be going into solving the technical problem, “How would you get an AI system to do some very modest concrete action requiring extremely high levels of intelligence, such as building two strawberries that are completely identical at the cellular level, without causing anything weird or disruptive to happen?”
Where obviously it’s important that the system not do anything severely unethical in the process of building its strawberries; but if your strawberry-building system requires its developers to have a full understanding of meta-ethics or value aggregation in order to be safe and effective, then you’ve made some kind of catastrophic design mistake and should start over with a different approach.
- Six Dimensions of Operational Adequacy in AGI Projects by 30 May 2022 17:00 UTC; 309 points) (
- On how various plans miss the hard bits of the alignment challenge by 12 Jul 2022 2:49 UTC; 305 points) (
- A central AI alignment problem: capabilities generalization, and the sharp left turn by 15 Jun 2022 13:10 UTC; 282 points) (
- Inner and outer alignment decompose one hard problem into two extremely hard problems by 2 Dec 2022 2:43 UTC; 146 points) (
- On how various plans miss the hard bits of the alignment challenge by 12 Jul 2022 5:35 UTC; 125 points) (EA Forum;
- Compendium of problems with RLHF by 29 Jan 2023 11:40 UTC; 120 points) (
- Solving the whole AGI control problem, version 0.0001 by 8 Apr 2021 15:14 UTC; 63 points) (
- A central AI alignment problem: capabilities generalization, and the sharp left turn by 15 Jun 2022 14:19 UTC; 51 points) (EA Forum;
- Planning to build a cryptographic box with perfect secrecy by 31 Dec 2023 9:31 UTC; 40 points) (
- Types and Degrees of Alignment by 31 May 2023 18:30 UTC; 34 points) (
- Compendium of problems with RLHF by 30 Jan 2023 8:48 UTC; 18 points) (EA Forum;
- Paying the corrigibility tax by 19 Apr 2023 1:57 UTC; 14 points) (
- 30 Aug 2019 23:35 UTC; 11 points) 's comment on [Link] Book Review: Reframing Superintelligence (SSC) by (
- Corrigibility, Self-Deletion, and Identical Strawberries by 28 Mar 2023 16:54 UTC; 9 points) (
- 15 Sep 2023 0:00 UTC; 7 points) 's comment on Instrumental Convergence Bounty by (
- 23 Sep 2024 11:06 UTC; 3 points) 's comment on Stephen Fowler’s Shortform by (
- 26 Oct 2022 5:34 UTC; 3 points) 's comment on Intent alignment should not be the goal for AGI x-risk reduction by (
- 16 Nov 2024 15:46 UTC; 1 point) 's comment on o1 is a bad idea by (
- 18 Sep 2022 18:27 UTC; 1 point) 's comment on Why are we sure that AI will “want” something? by (
Also relevant: Should ethicists be inside or outside a profession?
Has the net effect of global poverty discussion been negative for the x-risk movement? It seems to me that this is very much not the case. I remember Lukeprog writing that EA was one of the few groups which MIRI was able to draw supporters from.
It seems like discussion of near-term ethical issues might expand academia’s Overton window to admit more discussion of technical issues.
Reading between the lines: Eliezer thinks that the sort of next actions which discussion of near-term issues suggests will be negative for the long term?
The more I think about AI safety, the more I think that preventing an arms race is the most important thing. If you know there’s no arms race, you can take your time to make your AI as safe as you want. If you know there’s no arms race, you don’t need to implement a plan involving dangerous material actions in order to block some future AI from taking over. Furthermore, there’s a sense in which arms race incentives are well-aligned: if we get a positive singularity, that means material abundance for everyone; if there’s an AI disaster, it’s likely a disaster for everyone. So maybe all you’d have to do is convince all the relevant actors that this is true, then create common knowledge among all the relevant actors that all the relevant actors believe this. (Possible problem: relevant actors you aren’t aware of. E.g. North Korean hackers who have penetrated DeepMind. Is it possible to improve the state of secret-keeping technology?)
Eliezer was talking about discussions about ethics of AGI, and it sounds like you misinterpreted him as talking about discussions about ethics of narrow AI.
Also, I’m skeptical that bringing up narrow AI ethical issues is helpful for shifting academia’s Overton window to include existential risk from AI as a serious threat, and I suspect it may be counterproductive. Associating existential risk with narrow AI ethics seems to lead to people using the latter to derail discussions of the former. People sometimes dismiss concerns about existential risk from AI and then suggest that something should be done about some narrow AI ethical issue, and I suspect that they think they are offering a reasonable olive branch to people concerned about existential risk, despite their suggestions being useless for the purposes of existential risk reduction. This sort of thing would happen less if existential risk and ethics of narrow AI were less closely associated with each other.
It’s very likely that the majority of ethical discussion in AI will become politicized and therefore develop a narrow Overton window, which won’t cover the actually important technical work that needs to be done.
The way that I see this happening currently is that ethics discussions have come to largely surround two issues: 1) Whether the AI system “works” at all, even in the mundane sense (could software bugs cause catastrophic outcomes?) and 2) Is it being used to do things we consider good?
The first one is largely just a question of implementing fairly standard testing protocols and developing refinements to existing systems, which is more along the lines of how to prevent errors in narrow AI systems. The same question can be applied to any software system at all, regardless of its status as actually being an “AI”. In AGI ethics you pretty much assume that lack of capability is not the issue.
The second question is much more likely to have political aspects, this could include things like “Is our ML system biased against this demographic?” or “Are governments using it to spy on people?” or “Are large corporations becoming incredibly wealthy because of AI and therefore creating more inequality?” I also see this question as applying to any technology whatsoever and not really specifically about AI. The same things could be asked about statistical models, cameras, or factories. Therefore, much of our current and near-future “AI ethics” discussions will take on a similar tack to historical discussions about the ethics of some new technology of the era, like more powerful weapons, nuclear power, faster communications, the spread of new media forms, genetic engineering and so on. I don’t see these discussions as even pertaining to AGI risk in a proper sense, which should be considered in its own class, but they are likely to be conflated with it. Insofar as people generally do not have concrete “data” in front of them detailing exactly how and why something can go wrong, these discussions will probably not have favorable results.
With nuclear weapons, there was some actual “data” available, and that may have been enough to move the Overton window in the right direction, but with AGI there is practically no way of obtaining that with a large enough time window to allow society to implement the correct response.
AI safety is already a pretty politicized topic. Unfortunately, the main dimension I see it politicized on is the degree to which it’s a useful line of research in the first place. (I think it’s possible that the way AI safety has historically been advocated for might have something to do with this.) Some have argued that “AI ethics” will help with this issue.