Alignment Tax

TagLast edit: Dec 30, 2024, 8:40 PM by Dakara

Alignment Tax (sometimes called a safety tax) is the extra cost of ensuring that an AI system is aligned, relative to the cost of building an unaligned alternative. The term ‘tax’ can be misleading: in the safety literature, ‘alignment/safety tax’ or ‘alignment cost’ is meant to refer to increased developer time, extra compute, or decreased performance, and not only to the financial cost/tax required to build an aligned system.

In order to get a better idea of what the alignment tax is, consider some of the cases that lie at the edges. The best case scenario is No Tax: This means we lose no performance by aligning the system, so there is no reason to deploy an AI that is not aligned, i.e., we might as well align it. The worst case scenario is Max Tax: This means that we lose all performance by aligning the system, so alignment is functionally impossible. So you either deploy an unaligned system, or you don’t get any benefit from AI systems at all. We expect something in between these two scenarios to be the case.

Paul Christiano distinguishes two main approaches for dealing with the alignment tax.^[1]^[2]

The first is to have the will to pay the tax, i.e. the relevant actors (corporations, governments, etc.) would be willing to pay the extra costs to avoid deploying a system until it is aligned.
The second is to reduce the tax by differentially advancing existing alignable algorithms or by making existing algorithms more alignable. This means, for any potentially unaligned algorithm, ensuring the additional cost for an aligned version of the algorithm is low enough that the developers would be willing to pay it.

The case for a negative alignment tax

Cameron Berg, Judd Rosenblatt, Diogo de Lucena and AE Studio

Sep 18, 2024, 6:33 PM

77 points

20 comments7 min readLW link

Safety-capabilities tradeoff dials are inevitable in AGI

Steven ByrnesOct 7, 2021, 7:03 PM

59 points

4 comments3 min readLW link

How difficult is AI Alignment?

Sammy MartinSep 13, 2024, 3:47 PM

44 points

6 comments23 min readLW link

[Linkpost] Jan Leike on three kinds of alignment taxes

Orpheus16Jan 6, 2023, 11:57 PM

27 points

2 comments3 min readLW link

(aligned.substack.com)

AI safety tax dynamics

owencbOct 23, 2024, 12:18 PM

22 points

0 comments6 min readLW link

(strangecities.substack.com)

Alignment can be the ‘clean energy’ of AI

Cameron Berg, Judd Rosenblatt and AE Studio

Feb 22, 2025, 12:08 AM

68 points

8 comments8 min readLW link

Safety tax functions

owencbOct 20, 2024, 2:08 PM

31 points

0 comments6 min readLW link

(strangecities.substack.com)

The case for removing alignment and ML research from the training dataset

berenMay 30, 2023, 8:54 PM

48 points

8 comments5 min readLW link

Against ubiquitous alignment taxes

berenMar 6, 2023, 7:50 PM

57 points

10 comments2 min readLW link

Ten Levels of AI Alignment Difficulty

Sammy MartinJul 3, 2023, 8:20 PM

138 points

24 comments12 min readLW link 1 review

Labor Participation is a High-Priority AI Alignment Risk

alexJun 17, 2024, 6:09 PM

6 points

0 comments17 min readLW link

On the Importance of Open Sourcing Reward Models

elandgreJan 2, 2023, 7:01 PM

18 points

5 comments6 min readLW link

The commercial incentive to intentionally train AI to deceive us

Derek M. JonesDec 29, 2022, 11:30 AM

5 points

1 comment4 min readLW link

(shape-of-code.com)

Security Mindset and the Logistic Success Curve

Eliezer YudkowskyNov 26, 2017, 3:58 PM

106 points

49 comments20 min readLW link

No comments.

Align­ment Tax

Further reading

Alignment Tax