Power Seeking (AI)

TagLast edit: Oct 24, 2022, 10:49 PM by Raemon

Power Seeking is a property that agents might have, where they attempt to gain more general ability to control their environment. It’s particularly relevant to AIs, and related to Instrumental Convergence.

Instrumental convergence in single-agent systems

Edouard Harris and simonsdsuo

Oct 12, 2022, 12:24 PM

33 points

4 comments8 min readLW link

(www.gladstone.ai)

POWERplay: An open-source toolchain to study AI power-seeking

Edouard HarrisOct 24, 2022, 8:03 PM

29 points

0 comments1 min readLW link

(github.com)

Categorical-measure-theoretic approach to optimal policies tending to seek power

jacekJan 12, 2023, 12:32 AM

31 points

3 comments6 min readLW link

Power-Seeking = Minimising free energy

Jonas HallgrenFeb 22, 2023, 4:28 AM

21 points

10 comments7 min readLW link

A framework for thinking about AI power-seeking

Joe CarlsmithJul 24, 2024, 10:41 PM

62 points

15 comments16 min readLW link

Power-seeking for successive choices

adamShimiAug 12, 2021, 8:37 PM

11 points

9 comments4 min readLW link

Eli’s review of “Is power-seeking AI an existential risk?”

eliflandSep 30, 2022, 12:21 PM

67 points

0 comments3 min readLW link

(docs.google.com)

[AN #170]: Analyzing the argument for risk from power-seeking AI

Rohin ShahDec 8, 2021, 6:10 PM

21 points

1 comment7 min readLW link

(mailchi.mp)

Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake

TurnTroutNov 19, 2024, 6:36 PM

40 points

5 comments1 min readLW link

(turntrout.com)

Power-Seeking AI and Existential Risk

Antonio FrancaOct 11, 2022, 10:50 PM

6 points

0 comments9 min readLW link

Parametrically retargetable decision-makers tend to seek power

TurnTroutFeb 18, 2023, 6:41 PM

172 points

10 comments2 min readLW link

(arxiv.org)

Steering Llama-2 with contrastive activation additions

Nina Panickssery, Wuschel Schulz, NickGabs, Meg, evhub and TurnTrout

Jan 2, 2024, 12:47 AM

125 points

29 comments8 min readLW link

(arxiv.org)

Reviews of “Is power-seeking AI an existential risk?”

Joe CarlsmithDec 16, 2021, 8:48 PM

80 points

20 comments1 min readLW link

Generalizing the Power-Seeking Theorems

TurnTroutJul 27, 2020, 12:28 AM

41 points

6 comments4 min readLW link

Power-seeking can be probable and predictive for trained agents

Vika and janos

Feb 28, 2023, 9:10 PM

56 points

22 comments9 min readLW link

(arxiv.org)

[Linkpost] Shorter version of report on existential risk from power-seeking AI

Joe CarlsmithMar 22, 2023, 6:09 PM

7 points

0 comments1 min readLW link

Incentives from a causal perspective

tom4everitt, James Fox, RyanCarey, mattmacdermott, sbenthall and Jonathan Richens

Jul 10, 2023, 5:16 PM

27 points

0 comments6 min readLW link

Simple Way to Prevent Power-Seeking AI

research_prime_spaceDec 7, 2022, 12:26 AM

12 points

1 comment1 min readLW link

The Human Alignment Problem for AIs

rifeJan 22, 2025, 4:06 AM

10 points

5 comments3 min readLW link

From Human to Posthuman: Transhumanism, Anarcho-Capitalism, and AI’s Role in Global Disparity and Governance

DyingNaiveNov 6, 2024, 5:41 PM

1 point

0 comments1 min readLW link

Ideas for studies on AGI risk

dr_sApr 20, 2023, 6:17 PM

5 points

1 comment11 min readLW link

You can’t fetch the coffee if you’re dead: an AI dilemma

hennygeAug 31, 2023, 11:03 AM

1 point

0 comments4 min readLW link

The Game of Dominance

Karl von WendtAug 27, 2023, 11:04 AM

24 points

15 comments6 min readLW link

Questions about Value Lock-in, Paternalism, and Empowerment

Sam F. BrownNov 16, 2022, 3:33 PM

13 points

2 comments12 min readLW link

(sambrown.eu)

Computational signatures of psychopathy

Cameron BergDec 19, 2022, 5:01 PM

30 points

3 comments20 min readLW link

Natural Abstraction: Convergent Preferences Over Information Structures

paulomOct 14, 2023, 6:34 PM

28 points

1 comment36 min readLW link

My Overview of the AI Alignment Landscape: Threat Models

Neel NandaDec 25, 2021, 11:07 PM

53 points

3 comments28 min readLW link

No comments.

Power Seek­ing (AI)

Power Seeking (AI)