Impact Regularization

TagLast edit: Dec 30, 2024, 9:57 AM by Dakara

Impact Regularizers penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.

How do you rigorously define “low impact” in a way that a computer can understand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard; we don’t want AI systems to rampantly disrupt their environment. In the limit of goal-directed intelligence, theorems suggest that seeking power tends to be optimal; we don’t want highly capable AI systems to permanently wrench control of the future from us.

Currently, impact regularization research focuses on two approaches:

Relative reachability: the AI preserves its ability to reach many kinds of world-states. The hope is that by staying able to reach many goal states, the AI stays able to reach the correct goal state.
Attainable utility preservation: the AI preserves its ability to achieve one or more auxiliary goals. The hope is that by penalizing gaining or losing control over the future, the AI doesn’t take control away from us.

For a review of earlier work, see A Survey of Early Impact Measures.

Sequences on impact regularization:

Reframing Impact: we’re impacted when we become more or less able to achieve our goals. Seemingly, goal-directed AI systems are only incentivized to catastrophically impact us in order to gain power to achieve their own goals. To avoid catastrophic impact, what if we penalize the AI for gaining power?
Subagents and Impact Measures explores how subagents can circumvent current impact measure formalizations.

Reframing Impact

TurnTroutSep 20, 2019, 7:03 PM

98 points

15 comments1 min readLW link 1 review

Attainable Utility Preservation: Concepts

TurnTroutFeb 17, 2020, 5:20 AM

38 points

20 comments1 min readLW link

Tradeoff between desirable properties for baseline choices in impact measures

VikaJul 4, 2020, 11:56 AM

37 points

24 comments5 min readLW link

Towards a New Impact Measure

TurnTroutSep 18, 2018, 5:21 PM

103 points

159 comments33 min readLW link 2 reviews

Impact measurement and value-neutrality verification

evhubOct 15, 2019, 12:06 AM

31 points

13 comments6 min readLW link

[Question] Best reasons for pessimism about impact of impact measures?

TurnTroutApr 10, 2019, 5:22 PM

60 points

55 comments3 min readLW link

Attainable Utility Landscape: How The World Is Changed

TurnTroutFeb 10, 2020, 12:58 AM

52 points

7 comments6 min readLW link

The Catastrophic Convergence Conjecture

TurnTroutFeb 14, 2020, 9:16 PM

45 points

16 comments8 min readLW link

Attainable Utility Preservation: Empirical Results

TurnTrout and nealeratzlaff

Feb 22, 2020, 12:38 AM

66 points

8 comments10 min readLW link 1 review

How Low Should Fruit Hang Before We Pick It?

TurnTroutFeb 25, 2020, 2:08 AM

28 points

9 comments12 min readLW link

Attainable Utility Preservation: Scaling to Superhuman

TurnTroutFeb 27, 2020, 12:52 AM

28 points

22 comments8 min readLW link

Conclusion to ‘Reframing Impact’

TurnTroutFeb 28, 2020, 4:05 PM

40 points

18 comments2 min readLW link

Value Impact

TurnTroutSep 23, 2019, 12:47 AM

70 points

10 comments1 min readLW link

Deducing Impact

TurnTroutSep 24, 2019, 9:14 PM

72 points

28 comments1 min readLW link

Attainable Utility Theory: Why Things Matter

TurnTroutSep 27, 2019, 4:48 PM

72 points

24 comments1 min readLW link

World State is the Wrong Abstraction for Impact

TurnTroutOct 1, 2019, 9:03 PM

67 points

19 comments2 min readLW link

The Gears of Impact

TurnTroutOct 7, 2019, 2:44 PM

54 points

16 comments1 min readLW link

AXRP Episode 11 - Attainable Utility and Power with Alex Turner

DanielFilanSep 25, 2021, 9:10 PM

19 points

5 comments53 min readLW link

Reasons for Excitement about Impact of Impact Measure Research

TurnTroutFeb 27, 2020, 9:42 PM

33 points

8 comments4 min readLW link

Worrying about the Vase: Whitelisting

TurnTroutJun 16, 2018, 2:17 AM

73 points

26 comments11 min readLW link

Designing agent incentives to avoid side effects

Vika and TurnTrout

Mar 11, 2019, 8:55 PM

29 points

0 comments2 min readLW link

(medium.com)

Requirements for a STEM-capable AGI Value Learner (my Case for Less Doom)

RogerDearnaleyMay 25, 2023, 9:26 AM

33 points

3 comments15 min readLW link

Learning preferences by looking at the world

Rohin ShahFeb 12, 2019, 10:25 PM

43 points

10 comments7 min readLW link

(bair.berkeley.edu)

Test Cases for Impact Regularisation Methods

DanielFilanFeb 6, 2019, 9:50 PM

72 points

5 comments13 min readLW link

(danielfilan.com)

Dynamic inconsistency of the inaction and initial state baseline

Stuart_ArmstrongJul 7, 2020, 12:02 PM

30 points

8 comments2 min readLW link

Subagents and impact measures, full and fully illustrated

Stuart_ArmstrongFeb 24, 2020, 1:12 PM

31 points

14 comments17 min readLW link

Overcoming Clinginess in Impact Measures

TurnTroutJun 30, 2018, 10:51 PM

30 points

9 comments7 min readLW link

Appendix: how a subagent could get powerful

Stuart_ArmstrongJan 28, 2020, 3:28 PM

53 points

14 comments4 min readLW link

Appendix: mathematics of indexical impact measures

Stuart_ArmstrongFeb 17, 2020, 1:22 PM

12 points

0 comments4 min readLW link

[Question] Could there be “natural impact regularization” or “impact regularization by default”?

tailcalledDec 1, 2023, 10:01 PM

24 points

6 comments1 min readLW link

Understanding Recent Impact Measures

Matthew BarnettAug 7, 2019, 4:57 AM

16 points

6 comments7 min readLW link

A Survey of Early Impact Measures

Matthew BarnettAug 6, 2019, 1:22 AM

29 points

0 comments8 min readLW link

Four Ways An Impact Measure Could Help Alignment

Matthew BarnettAug 8, 2019, 12:10 AM

21 points

1 comment9 min readLW link

Impact Measure Desiderata

TurnTroutSep 2, 2018, 10:21 PM

36 points

41 comments5 min readLW link

Why is the impact penalty time-inconsistent?

Stuart_ArmstrongJul 9, 2020, 5:26 PM

16 points

1 comment2 min readLW link

AI Alignment 2018-19 Review

Rohin ShahJan 28, 2020, 2:19 AM

126 points

6 comments35 min readLW link

Penalizing Impact via Attainable Utility Preservation

TurnTroutDec 28, 2018, 9:46 PM

20 points

0 comments3 min readLW link

(arxiv.org)

Reversible changes: consider a bucket of water

Stuart_ArmstrongAug 26, 2019, 10:55 PM

25 points

18 comments2 min readLW link

Hedonic Loops and Taming RL

berenJul 19, 2023, 3:12 PM

20 points

14 comments9 min readLW link

Avoiding Side Effects in Complex Environments

TurnTrout and nealeratzlaff

Dec 12, 2020, 12:34 AM

62 points

12 comments2 min readLW link

(avoiding-side-effects.github.io)

A Critique of Non-Obstruction

Joe CollmanFeb 3, 2021, 8:45 AM

13 points

9 comments4 min readLW link

AXRP Episode 7 - Side Effects with Victoria Krakovna

DanielFilanMay 14, 2021, 3:50 AM

34 points

6 comments43 min readLW link

Alex Turner’s Research, Comprehensive Information Gathering

adamShimiJun 23, 2021, 9:44 AM

15 points

3 comments3 min readLW link

Simplified preferences needed; simplified preferences sufficient

Stuart_ArmstrongMar 5, 2019, 7:39 PM

33 points

6 comments3 min readLW link

A Block-Based Regularization Proposal for Neural Networks

Otto.DevApr 19, 2025, 6:56 PM

−8 points

0 comments1 min readLW link

Open Problems in Negative Side Effect Minimization

Fabian Schimpf and Lukas Fluri

May 6, 2022, 9:37 AM

12 points

6 comments17 min readLW link

Announcement: AI alignment prize round 4 winners

cousin_itJan 20, 2019, 2:46 PM

74 points

41 comments1 min readLW link

Yudkowsky on AGI ethics

Rob BensingerOct 19, 2017, 11:13 PM

69 points

5 comments2 min readLW link

[Question] “Do Nothing” utility function, 3½ years later?

niplavJul 20, 2020, 11:09 AM

5 points

3 comments1 min readLW link

Asymptotically Unambitious AGI

michaelcohenApr 10, 2020, 12:31 PM

50 points

217 comments2 min readLW link

[AN #68]: The attainable utility theory of impact

Rohin ShahOct 14, 2019, 5:00 PM

17 points

0 comments8 min readLW link

(mailchi.mp)

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well)

Roland PihlakasJan 12, 2025, 3:37 AM

46 points

7 comments10 min readLW link

No comments.

Im­pact Regularization

Impact Regularization