RSS

Align­ment Tax

TagLast edit: 28 Apr 2023 10:58 UTC by markov

An alignment tax (sometimes called a safety tax) is the extra cost of ensuring that an AI system is aligned, relative to the cost of building an unaligned alternative. The term ‘tax’ can be misleading: in the safety literature, ‘alignment/​safety tax’ or ‘alignment cost’ is meant to refer to increased developer time, extra compute, or decreased performance, and not only to the financial cost/​tax required to build an aligned system.

In order to get a better idea of what the alignment tax is, consider the cases that lie at the edges. The best case scenario is No Tax: This means we lose no performance by aligning the system, so there is no reason to deploy an AI that is not aligned, i.e., we might as well align it. The worst case scenario is Max Tax: This means that we lose all performance by aligning the system, so alignment is functionally impossible. So you either deploy an unaligned system, or you don’t get any benefit from AI systems at all. We expect something in between these two scenarios to be the case.

Paul Christiano distinguishes two main approaches for dealing with the alignment tax.[1][2]

Further reading

The case for a nega­tive al­ign­ment tax

18 Sep 2024 18:33 UTC
79 points
20 comments7 min readLW link

Safety tax functions

owencb20 Oct 2024 14:08 UTC
30 points
0 comments6 min readLW link
(strangecities.substack.com)

AI safety tax dynamics

owencb23 Oct 2024 12:18 UTC
22 points
0 comments6 min readLW link
(strangecities.substack.com)

[Linkpost] Jan Leike on three kinds of al­ign­ment taxes

Akash6 Jan 2023 23:57 UTC
27 points
2 comments3 min readLW link
(aligned.substack.com)

Against ubiquitous al­ign­ment taxes

beren6 Mar 2023 19:50 UTC
56 points
10 comments2 min readLW link

The case for re­mov­ing al­ign­ment and ML re­search from the train­ing dataset

beren30 May 2023 20:54 UTC
48 points
8 comments5 min readLW link

How difficult is AI Align­ment?

Sammy Martin13 Sep 2024 15:47 UTC
43 points
6 comments23 min readLW link

Ten Levels of AI Align­ment Difficulty

Sammy Martin3 Jul 2023 20:20 UTC
121 points
14 comments12 min readLW link

La­bor Par­ti­ci­pa­tion is a High-Pri­or­ity AI Align­ment Risk

alex17 Jun 2024 18:09 UTC
4 points
0 comments17 min readLW link

Se­cu­rity Mind­set and the Lo­gis­tic Suc­cess Curve

Eliezer Yudkowsky26 Nov 2017 15:58 UTC
104 points
49 comments20 min readLW link

The com­mer­cial in­cen­tive to in­ten­tion­ally train AI to de­ceive us

Derek M. Jones29 Dec 2022 11:30 UTC
5 points
1 comment4 min readLW link
(shape-of-code.com)

On the Im­por­tance of Open Sourc­ing Re­ward Models

elandgre2 Jan 2023 19:01 UTC
18 points
5 comments6 min readLW link
No comments.