RSS

Im­pact Regularization

TagLast edit: 9 Dec 2022 9:04 UTC by Vika

Impact regularizers penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.

How do you rigorously define “low impact” in a way that a computer can understand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard; we don’t want AI systems to rampantly disrupt their environment. In the limit of goal-directed intelligence, theorems suggest that seeking power tends to be optimal; we don’t want highly capable AI systems to permanently wrench control of the future from us.

Currently, impact regularization research focuses on two approaches:

For a review of earlier work, see A Survey of Early Impact Measures.

Sequences on impact regularization:

Related tags: Instrumental Convergence, Corrigibility, Mild Optimization.

Refram­ing Impact

TurnTrout20 Sep 2019 19:03 UTC
98 points
15 comments1 min readLW link1 review

At­tain­able Utility Preser­va­tion: Concepts

TurnTrout17 Feb 2020 5:20 UTC
38 points
20 comments1 min readLW link

Towards a New Im­pact Measure

TurnTrout18 Sep 2018 17:21 UTC
103 points
159 comments33 min readLW link2 reviews

Im­pact mea­sure­ment and value-neu­tral­ity verification

evhub15 Oct 2019 0:06 UTC
31 points
13 comments6 min readLW link

Trade­off be­tween de­sir­able prop­er­ties for baseline choices in im­pact measures

Vika4 Jul 2020 11:56 UTC
37 points
24 comments5 min readLW link

[Question] Best rea­sons for pes­simism about im­pact of im­pact mea­sures?

TurnTrout10 Apr 2019 17:22 UTC
60 points
55 comments3 min readLW link

At­tain­able Utility Preser­va­tion: Em­piri­cal Results

22 Feb 2020 0:38 UTC
66 points
8 comments10 min readLW link1 review

How Low Should Fruit Hang Be­fore We Pick It?

TurnTrout25 Feb 2020 2:08 UTC
28 points
9 comments12 min readLW link

At­tain­able Utility Preser­va­tion: Scal­ing to Superhuman

TurnTrout27 Feb 2020 0:52 UTC
28 points
22 comments8 min readLW link

Rea­sons for Ex­cite­ment about Im­pact of Im­pact Mea­sure Research

TurnTrout27 Feb 2020 21:42 UTC
33 points
8 comments4 min readLW link

Con­clu­sion to ‘Refram­ing Im­pact’

TurnTrout28 Feb 2020 16:05 UTC
40 points
18 comments2 min readLW link

De­sign­ing agent in­cen­tives to avoid side effects

11 Mar 2019 20:55 UTC
29 points
0 comments2 min readLW link
(medium.com)

Wor­ry­ing about the Vase: Whitelisting

TurnTrout16 Jun 2018 2:17 UTC
73 points
26 comments11 min readLW link

AXRP Epi­sode 11 - At­tain­able Utility and Power with Alex Turner

DanielFilan25 Sep 2021 21:10 UTC
19 points
5 comments53 min readLW link

Value Impact

TurnTrout23 Sep 2019 0:47 UTC
70 points
10 comments1 min readLW link

At­tain­able Utility The­ory: Why Things Matter

TurnTrout27 Sep 2019 16:48 UTC
72 points
24 comments1 min readLW link

World State is the Wrong Ab­strac­tion for Impact

TurnTrout1 Oct 2019 21:03 UTC
67 points
19 comments2 min readLW link

The Gears of Impact

TurnTrout7 Oct 2019 14:44 UTC
54 points
16 comments1 min readLW link

At­tain­able Utility Land­scape: How The World Is Changed

TurnTrout10 Feb 2020 0:58 UTC
52 points
7 comments6 min readLW link

The Catas­trophic Con­ver­gence Conjecture

TurnTrout14 Feb 2020 21:16 UTC
45 points
16 comments8 min readLW link

De­duc­ing Impact

TurnTrout24 Sep 2019 21:14 UTC
72 points
28 comments1 min readLW link

Learn­ing prefer­ences by look­ing at the world

Rohin Shah12 Feb 2019 22:25 UTC
43 points
10 comments7 min readLW link
(bair.berkeley.edu)

Re­quire­ments for a STEM-ca­pa­ble AGI Value Learner (my Case for Less Doom)

RogerDearnaley25 May 2023 9:26 UTC
33 points
3 comments15 min readLW link

Test Cases for Im­pact Reg­u­lari­sa­tion Methods

DanielFilan6 Feb 2019 21:50 UTC
72 points
5 comments13 min readLW link
(danielfilan.com)

Dy­namic in­con­sis­tency of the in­ac­tion and ini­tial state baseline

Stuart_Armstrong7 Jul 2020 12:02 UTC
30 points
8 comments2 min readLW link

Subagents and im­pact mea­sures, full and fully illustrated

Stuart_Armstrong24 Feb 2020 13:12 UTC
31 points
14 comments17 min readLW link

Over­com­ing Cling­i­ness in Im­pact Measures

TurnTrout30 Jun 2018 22:51 UTC
30 points
9 comments7 min readLW link

Ap­pendix: how a sub­agent could get powerful

Stuart_Armstrong28 Jan 2020 15:28 UTC
53 points
14 comments4 min readLW link

Ap­pendix: math­e­mat­ics of in­dex­i­cal im­pact measures

Stuart_Armstrong17 Feb 2020 13:22 UTC
12 points
0 comments4 min readLW link

[Question] Could there be “nat­u­ral im­pact reg­u­lariza­tion” or “im­pact reg­u­lariza­tion by de­fault”?

tailcalled1 Dec 2023 22:01 UTC
24 points
6 comments1 min readLW link

Un­der­stand­ing Re­cent Im­pact Measures

Matthew Barnett7 Aug 2019 4:57 UTC
16 points
6 comments7 min readLW link

A Sur­vey of Early Im­pact Measures

Matthew Barnett6 Aug 2019 1:22 UTC
29 points
0 comments8 min readLW link

Four Ways An Im­pact Mea­sure Could Help Alignment

Matthew Barnett8 Aug 2019 0:10 UTC
21 points
1 comment9 min readLW link

Im­pact Mea­sure Desiderata

TurnTrout2 Sep 2018 22:21 UTC
36 points
41 comments5 min readLW link

Why is the im­pact penalty time-in­con­sis­tent?

Stuart_Armstrong9 Jul 2020 17:26 UTC
16 points
1 comment2 min readLW link

AI Align­ment 2018-19 Review

Rohin Shah28 Jan 2020 2:19 UTC
126 points
6 comments35 min readLW link

Pe­nal­iz­ing Im­pact via At­tain­able Utility Preservation

TurnTrout28 Dec 2018 21:46 UTC
20 points
0 comments3 min readLW link
(arxiv.org)

Rev­ersible changes: con­sider a bucket of water

Stuart_Armstrong26 Aug 2019 22:55 UTC
25 points
18 comments2 min readLW link

He­donic Loops and Tam­ing RL

beren19 Jul 2023 15:12 UTC
20 points
14 comments9 min readLW link

Avoid­ing Side Effects in Com­plex Environments

12 Dec 2020 0:34 UTC
62 points
12 comments2 min readLW link
(avoiding-side-effects.github.io)

A Cri­tique of Non-Obstruction

Joe Collman3 Feb 2021 8:45 UTC
13 points
9 comments4 min readLW link

AXRP Epi­sode 7 - Side Effects with Vic­to­ria Krakovna

DanielFilan14 May 2021 3:50 UTC
34 points
6 comments43 min readLW link

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimi23 Jun 2021 9:44 UTC
15 points
3 comments3 min readLW link

Sim­plified prefer­ences needed; sim­plified prefer­ences sufficient

Stuart_Armstrong5 Mar 2019 19:39 UTC
33 points
6 comments3 min readLW link

[Question] “Do Noth­ing” util­ity func­tion, 3½ years later?

niplav20 Jul 2020 11:09 UTC
5 points
3 comments1 min readLW link

Open Prob­lems in Nega­tive Side Effect Minimization

6 May 2022 9:37 UTC
12 points
6 comments17 min readLW link

An­nounce­ment: AI al­ign­ment prize round 4 winners

cousin_it20 Jan 2019 14:46 UTC
74 points
41 comments1 min readLW link

Asymp­tot­i­cally Unam­bi­tious AGI

michaelcohen10 Apr 2020 12:31 UTC
50 points
217 comments2 min readLW link

[AN #68]: The at­tain­able util­ity the­ory of impact

Rohin Shah14 Oct 2019 17:00 UTC
17 points
0 comments8 min readLW link
(mailchi.mp)

Yud­kowsky on AGI ethics

Rob Bensinger19 Oct 2017 23:13 UTC
69 points
5 comments2 min readLW link

RoboNet—A new in­ter­net pro­to­col for AI

antoniomax30 May 2023 17:55 UTC
−13 points
1 comment18 min readLW link
No comments.