Power Seeking (AI)

TagLast edit: 24 Oct 2022 22:49 UTC by Raemon

Power Seeking is a property that agents might have, where they attempt to gain more general ability to control their environment. It’s particularly relevant to AIs, and related to Instrumental Convergence.

Instrumental convergence in single-agent systems

Edouard Harris and simonsdsuo

12 Oct 2022 12:24 UTC

33 points

4 comments8 min readLW link

(www.gladstone.ai)

Power-Seeking = Minimising free energy

Jonas Hallgren22 Feb 2023 4:28 UTC

21 points

10 comments7 min readLW link

POWERplay: An open-source toolchain to study AI power-seeking

Edouard Harris24 Oct 2022 20:03 UTC

29 points

0 comments1 min readLW link

(github.com)

Categorical-measure-theoretic approach to optimal policies tending to seek power

jacek12 Jan 2023 0:32 UTC

31 points

3 comments6 min readLW link

Steering Llama-2 with contrastive activation additions

Nina Panickssery, Wuschel Schulz, NickGabs, Meg, evhub and TurnTrout

2 Jan 2024 0:47 UTC

123 points

29 comments8 min readLW link

(arxiv.org)

A framework for thinking about AI power-seeking

Joe Carlsmith24 Jul 2024 22:41 UTC

62 points

15 comments16 min readLW link

Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake

TurnTrout19 Nov 2024 18:36 UTC

31 points

2 comments1 min readLW link

(turntrout.com)

[Linkpost] Shorter version of report on existential risk from power-seeking AI

Joe Carlsmith22 Mar 2023 18:09 UTC

7 points

0 comments1 min readLW link

Power-seeking for successive choices

adamShimi12 Aug 2021 20:37 UTC

11 points

9 comments4 min readLW link

Power-Seeking AI and Existential Risk

Antonio Franca11 Oct 2022 22:50 UTC

6 points

0 comments9 min readLW link

Generalizing the Power-Seeking Theorems

TurnTrout27 Jul 2020 0:28 UTC

41 points

6 comments4 min readLW link

Reviews of “Is power-seeking AI an existential risk?”

Joe Carlsmith16 Dec 2021 20:48 UTC

80 points

20 comments1 min readLW link

Eli’s review of “Is power-seeking AI an existential risk?”

elifland30 Sep 2022 12:21 UTC

67 points

0 comments3 min readLW link

(docs.google.com)

[AN #170]: Analyzing the argument for risk from power-seeking AI

Rohin Shah8 Dec 2021 18:10 UTC

21 points

1 comment7 min readLW link

(mailchi.mp)

Parametrically retargetable decision-makers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC

172 points

10 comments2 min readLW link

(arxiv.org)

Power-seeking can be probable and predictive for trained agents

Vika and janos

28 Feb 2023 21:10 UTC

56 points

22 comments9 min readLW link

(arxiv.org)

The Waluigi Effect (mega-post)

Cleo Nardo3 Mar 2023 3:22 UTC

628 points

187 comments16 min readLW link

Computational signatures of psychopathy

Cameron Berg19 Dec 2022 17:01 UTC

29 points

3 comments20 min readLW link

Simple Way to Prevent Power-Seeking AI

research_prime_space7 Dec 2022 0:26 UTC

12 points

1 comment1 min readLW link

From Human to Posthuman: Transhumanism, Anarcho-Capitalism, and AI’s Role in Global Disparity and Governance

DyingNaive6 Nov 2024 17:41 UTC

1 point

0 comments1 min readLW link

Questions about Value Lock-in, Paternalism, and Empowerment

Sam F. Brown16 Nov 2022 15:33 UTC

13 points

2 comments12 min readLW link

(sambrown.eu)

Ideas for studies on AGI risk

dr_s20 Apr 2023 18:17 UTC

5 points

1 comment11 min readLW link

My Overview of the AI Alignment Landscape: Threat Models

Neel Nanda25 Dec 2021 23:07 UTC

52 points

3 comments28 min readLW link

Incentives from a causal perspective

tom4everitt, James Fox, RyanCarey, mattmacdermott, sbenthall and Jonathan Richens

10 Jul 2023 17:16 UTC

27 points

0 comments6 min readLW link

The Game of Dominance

Karl von Wendt27 Aug 2023 11:04 UTC

24 points

15 comments6 min readLW link

You can’t fetch the coffee if you’re dead: an AI dilemma

hennyge31 Aug 2023 11:03 UTC

1 point

0 comments4 min readLW link

Natural Abstraction: Convergent Preferences Over Information Structures

paulom14 Oct 2023 18:34 UTC

13 points

1 comment36 min readLW link

No comments.

Power Seek­ing (AI)

Power Seeking (AI)