RSS

Im­pact Regularization

TagLast edit: Dec 30, 2024, 9:57 AM by Dakara

Impact Regularizers penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.

How do you rigorously define “low impact” in a way that a computer can understand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard; we don’t want AI systems to rampantly disrupt their environment. In the limit of goal-directed intelligence, theorems suggest that seeking power tends to be optimal; we don’t want highly capable AI systems to permanently wrench control of the future from us.

Currently, impact regularization research focuses on two approaches:

For a review of earlier work, see A Survey of Early Impact Measures.

Sequences on impact regularization:

Related tags: Instrumental Convergence, Corrigibility, Mild Optimization.

Refram­ing Impact

TurnTroutSep 20, 2019, 7:03 PM
98 points
15 comments1 min readLW link1 review

At­tain­able Utility Preser­va­tion: Concepts

TurnTroutFeb 17, 2020, 5:20 AM
38 points
20 comments1 min readLW link

Trade­off be­tween de­sir­able prop­er­ties for baseline choices in im­pact measures

VikaJul 4, 2020, 11:56 AM
37 points
24 comments5 min readLW link

Im­pact mea­sure­ment and value-neu­tral­ity verification

evhubOct 15, 2019, 12:06 AM
31 points
13 comments6 min readLW link

Towards a New Im­pact Measure

TurnTroutSep 18, 2018, 5:21 PM
103 points
159 comments33 min readLW link2 reviews

[Question] Best rea­sons for pes­simism about im­pact of im­pact mea­sures?

TurnTroutApr 10, 2019, 5:22 PM
60 points
55 comments3 min readLW link

At­tain­able Utility Preser­va­tion: Scal­ing to Superhuman

TurnTroutFeb 27, 2020, 12:52 AM
28 points
22 comments8 min readLW link

De­sign­ing agent in­cen­tives to avoid side effects

Mar 11, 2019, 8:55 PM
29 points
0 comments2 min readLW link
(medium.com)

Wor­ry­ing about the Vase: Whitelisting

TurnTroutJun 16, 2018, 2:17 AM
73 points
26 comments11 min readLW link

Value Impact

TurnTroutSep 23, 2019, 12:47 AM
70 points
10 comments1 min readLW link

De­duc­ing Impact

TurnTroutSep 24, 2019, 9:14 PM
72 points
28 comments1 min readLW link

At­tain­able Utility The­ory: Why Things Matter

TurnTroutSep 27, 2019, 4:48 PM
72 points
24 comments1 min readLW link

World State is the Wrong Ab­strac­tion for Impact

TurnTroutOct 1, 2019, 9:03 PM
67 points
19 comments2 min readLW link

The Gears of Impact

TurnTroutOct 7, 2019, 2:44 PM
54 points
16 comments1 min readLW link

At­tain­able Utility Land­scape: How The World Is Changed

TurnTroutFeb 10, 2020, 12:58 AM
52 points
7 comments6 min readLW link

The Catas­trophic Con­ver­gence Conjecture

TurnTroutFeb 14, 2020, 9:16 PM
45 points
16 comments8 min readLW link

At­tain­able Utility Preser­va­tion: Em­piri­cal Results

Feb 22, 2020, 12:38 AM
66 points
8 comments10 min readLW link1 review

How Low Should Fruit Hang Be­fore We Pick It?

TurnTroutFeb 25, 2020, 2:08 AM
28 points
9 comments12 min readLW link

Rea­sons for Ex­cite­ment about Im­pact of Im­pact Mea­sure Research

TurnTroutFeb 27, 2020, 9:42 PM
33 points
8 comments4 min readLW link

Con­clu­sion to ‘Refram­ing Im­pact’

TurnTroutFeb 28, 2020, 4:05 PM
40 points
18 comments2 min readLW link

AXRP Epi­sode 11 - At­tain­able Utility and Power with Alex Turner

DanielFilanSep 25, 2021, 9:10 PM
19 points
5 comments53 min readLW link

Learn­ing prefer­ences by look­ing at the world

Rohin ShahFeb 12, 2019, 10:25 PM
43 points
10 comments7 min readLW link
(bair.berkeley.edu)

Re­quire­ments for a STEM-ca­pa­ble AGI Value Learner (my Case for Less Doom)

RogerDearnaleyMay 25, 2023, 9:26 AM
33 points
3 comments15 min readLW link

Ap­pendix: math­e­mat­ics of in­dex­i­cal im­pact measures

Stuart_ArmstrongFeb 17, 2020, 1:22 PM
12 points
0 comments4 min readLW link

Dy­namic in­con­sis­tency of the in­ac­tion and ini­tial state baseline

Stuart_ArmstrongJul 7, 2020, 12:02 PM
30 points
8 comments2 min readLW link

Test Cases for Im­pact Reg­u­lari­sa­tion Methods

DanielFilanFeb 6, 2019, 9:50 PM
72 points
5 comments13 min readLW link
(danielfilan.com)

Un­der­stand­ing Re­cent Im­pact Measures

Matthew BarnettAug 7, 2019, 4:57 AM
16 points
6 comments7 min readLW link

A Sur­vey of Early Im­pact Measures

Matthew BarnettAug 6, 2019, 1:22 AM
29 points
0 comments8 min readLW link

Four Ways An Im­pact Mea­sure Could Help Alignment

Matthew BarnettAug 8, 2019, 12:10 AM
21 points
1 comment9 min readLW link

AXRP Epi­sode 7 - Side Effects with Vic­to­ria Krakovna

DanielFilanMay 14, 2021, 3:50 AM
34 points
6 comments43 min readLW link

Im­pact Mea­sure Desiderata

TurnTroutSep 2, 2018, 10:21 PM
36 points
41 comments5 min readLW link

Why is the im­pact penalty time-in­con­sis­tent?

Stuart_ArmstrongJul 9, 2020, 5:26 PM
16 points
1 comment2 min readLW link

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimiJun 23, 2021, 9:44 AM
15 points
3 comments3 min readLW link

[Question] Could there be “nat­u­ral im­pact reg­u­lariza­tion” or “im­pact reg­u­lariza­tion by de­fault”?

tailcalledDec 1, 2023, 10:01 PM
24 points
6 comments1 min readLW link

He­donic Loops and Tam­ing RL

berenJul 19, 2023, 3:12 PM
20 points
14 comments9 min readLW link

Avoid­ing Side Effects in Com­plex Environments

Dec 12, 2020, 12:34 AM
62 points
12 comments2 min readLW link
(avoiding-side-effects.github.io)

Pe­nal­iz­ing Im­pact via At­tain­able Utility Preservation

TurnTroutDec 28, 2018, 9:46 PM
20 points
0 comments3 min readLW link
(arxiv.org)

AI Align­ment 2018-19 Review

Rohin ShahJan 28, 2020, 2:19 AM
126 points
6 comments35 min readLW link

A Cri­tique of Non-Obstruction

Joe CollmanFeb 3, 2021, 8:45 AM
13 points
9 comments4 min readLW link

Rev­ersible changes: con­sider a bucket of water

Stuart_ArmstrongAug 26, 2019, 10:55 PM
25 points
18 comments2 min readLW link

Subagents and im­pact mea­sures, full and fully illustrated

Stuart_ArmstrongFeb 24, 2020, 1:12 PM
31 points
14 comments17 min readLW link

Over­com­ing Cling­i­ness in Im­pact Measures

TurnTroutJun 30, 2018, 10:51 PM
30 points
9 comments7 min readLW link

Ap­pendix: how a sub­agent could get powerful

Stuart_ArmstrongJan 28, 2020, 3:28 PM
53 points
14 comments4 min readLW link

[Question] “Do Noth­ing” util­ity func­tion, 3½ years later?

niplavJul 20, 2020, 11:09 AM
5 points
3 comments1 min readLW link

An­nounce­ment: AI al­ign­ment prize round 4 winners

cousin_itJan 20, 2019, 2:46 PM
74 points
41 comments1 min readLW link

Sim­plified prefer­ences needed; sim­plified prefer­ences sufficient

Stuart_ArmstrongMar 5, 2019, 7:39 PM
33 points
6 comments3 min readLW link

Why mod­el­ling multi-ob­jec­tive home­osta­sis is es­sen­tial for AI al­ign­ment (and how it helps with AI safety as well)

Roland PihlakasJan 12, 2025, 3:37 AM
38 points
6 comments10 min readLW link

Open Prob­lems in Nega­tive Side Effect Minimization

May 6, 2022, 9:37 AM
12 points
6 comments17 min readLW link

Asymp­tot­i­cally Unam­bi­tious AGI

michaelcohenApr 10, 2020, 12:31 PM
50 points
217 comments2 min readLW link

[AN #68]: The at­tain­able util­ity the­ory of impact

Rohin ShahOct 14, 2019, 5:00 PM
17 points
0 comments8 min readLW link
(mailchi.mp)

Yud­kowsky on AGI ethics

Rob BensingerOct 19, 2017, 11:13 PM
69 points
5 comments2 min readLW link
No comments.