Safetywashing
In southern California there’s a two-acre butterfly preserve owned by the oil company Chevron. They spend little to maintain it, but many millions on television advertisements featuring it as evidence of their environmental stewardship.[1]
Environmentalists have a word for behavior like this: greenwashing. Greenwashing is when companies misleadingly portray themselves, or their products, as more environmentally-friendly than they are.
Greenwashing often does cause real environmental benefit. Take the signs in hotels discouraging you from washing your towels:
My guess is that the net environmental effect of these signs is in fact mildly positive. And while the most central examples of greenwashing involve deception, I’m sure some of these signs are put up by people who earnestly care. But I suspect hotels might tend to care less about water waste if utilities were less expensive, and that Chevron might care less about El Segundo Blue butterflies if environmental regulations were less expensive.
The field of AI alignment is growing rapidly. Each year it attracts more resources, more mindshare, more people trying to help. The more it grows, the more people will be incentivized to misleadingly portray themselves or their projects as more alignment-friendly than they are.
I think some of this is happening already. For example, a capabilities company launched recently with the aim of training transformers to use every API in the world, which they described as the “safest path to general intelligence.” As I understand it, their argument is that this helps with alignment because it involves collecting feedback about people’s preferences, and because humans often wish AI systems could more easily take actions in the physical world, which is easier once you know how to use all the APIs.[2]
It’s easier to avoid things that are easier to notice, and easier to notice things with good handles. So I propose adopting the handle “safetywashing.”
- ^
From what I can tell, the original source for this claim is the book “The Corporate Planet: Ecology and Politics in the Age of Globalization,” which from my samples seems about as pro-Chevron as you’d expect from the title. So I wouldn’t be stunned if the claim were misleading, though the numbers passed my sanity check, and I did confirm the preserve and advertisements exist.
- ^
I haven’t talked with anyone who works at this company, and all I know about their plans is from the copy on their website. My guess is that their project harms, rather than helps, our ability to ensure AGI remains safe, but I might be missing something.
- Beware safety-washing by 13 Jan 2023 10:39 UTC; 141 points) (EA Forum;
- 28 Apr 2023 20:13 UTC; 97 points) 's comment on “notkilleveryoneism” sounds dumb by (
- Voting Results for the 2022 Review by 2 Feb 2024 20:34 UTC; 57 points) (
- Beware safety-washing by 13 Jan 2023 13:59 UTC; 43 points) (
- 29 Apr 2024 17:00 UTC; 37 points) 's comment on Joining the Carnegie Endowment for International Peace by (EA Forum;
- 14 Jul 2023 19:37 UTC; 11 points) 's comment on Elon Musk announces xAI by (
- 13 Jan 2023 16:18 UTC; 9 points) 's comment on Beware safety-washing by (
- 4 May 2023 7:23 UTC; 6 points) 's comment on AGI rising: why we are in a new era of acute risk and increasing public awareness, and what to do now by (
- 15 Jan 2024 2:17 UTC; 4 points) 's comment on High Reliability Orgs, and AI Companies by (
I’ve used the term “safetwashing” at least once every week or two in the last year. I don’t know whether I’ve picked it up from this post, but it still seems good to have an explanation of a term that is this useful and this common that people are exposed to.
I just want to point out that safety-washing is term I heard a lot when I was working on AI Ethics in 2018. It seemed like a pretty well-known term at the time, at least to the people I talked to in that community. Not sure how widespread it is in other disciplines.
Interesting, I checked LW/Google for the keyword before writing and didn’t see much, but maybe I missed it; it does seem like a fairly natural riff, e.g. someone wrote a similar post on EA forum a few months later.
I think most of our conversations about it were on Twitter and maybe Slack so maybe that makes a difference?
Safetywashing describes a phenomenon that is real, inevitable, and profoundly unsurprising (I am still surprised whenever I see it, but that’s my fault for knowing something is probable and being surprised anyway). Things like this are fundamental to human systems; people who read the Sequences know this.
This post doesn’t prepare people, at all, for the complexity of how this would play out in reality. It’s possible that most posts would fail to prepare people, because these posts change goalposts; and in the mundane process of following their incentives, both adversaries and wishful thinkers (and everything in between) automatically adapt around the cultural expectations set. However, it is a critical first step and vastly superior to nothing at all.
Anticipating ways for Goodhart’s law to play out in reality isn’t a nerdy hobby, it isn’t even a way of life, it’s being an adult/agent in the real world.