RSS

Power Seek­ing (AI)

TagLast edit: Oct 24, 2022, 10:49 PM by Raemon

Power Seeking is a property that agents might have, where they attempt to gain more general ability to control their environment. It’s particularly relevant to AIs, and related to Instrumental Convergence.

In­stru­men­tal con­ver­gence in sin­gle-agent systems

Oct 12, 2022, 12:24 PM
33 points
4 comments8 min readLW link
(www.gladstone.ai)

Power-Seek­ing = Min­imis­ing free energy

Jonas HallgrenFeb 22, 2023, 4:28 AM
21 points
10 comments7 min readLW link

Cat­e­gor­i­cal-mea­sure-the­o­retic ap­proach to op­ti­mal poli­cies tend­ing to seek power

jacekJan 12, 2023, 12:32 AM
31 points
3 comments6 min readLW link

POWER­play: An open-source toolchain to study AI power-seeking

Edouard HarrisOct 24, 2022, 8:03 PM
29 points
0 comments1 min readLW link
(github.com)

Steer­ing Llama-2 with con­trastive ac­ti­va­tion additions

Jan 2, 2024, 12:47 AM
125 points
29 comments8 min readLW link
(arxiv.org)

A frame­work for think­ing about AI power-seeking

Joe CarlsmithJul 24, 2024, 10:41 PM
62 points
15 comments16 min readLW link

In­trin­sic Power-Seek­ing: AI Might Seek Power for Power’s Sake

TurnTroutNov 19, 2024, 6:36 PM
40 points
5 comments1 min readLW link
(turntrout.com)

[Linkpost] Shorter ver­sion of re­port on ex­is­ten­tial risk from power-seek­ing AI

Joe CarlsmithMar 22, 2023, 6:09 PM
7 points
0 comments1 min readLW link

Power-seek­ing for suc­ces­sive choices

adamShimiAug 12, 2021, 8:37 PM
11 points
9 comments4 min readLW link

Power-Seek­ing AI and Ex­is­ten­tial Risk

Antonio FrancaOct 11, 2022, 10:50 PM
6 points
0 comments9 min readLW link

Gen­er­al­iz­ing the Power-Seek­ing Theorems

TurnTroutJul 27, 2020, 12:28 AM
41 points
6 comments4 min readLW link

Re­views of “Is power-seek­ing AI an ex­is­ten­tial risk?”

Joe CarlsmithDec 16, 2021, 8:48 PM
80 points
20 comments1 min readLW link

Eli’s re­view of “Is power-seek­ing AI an ex­is­ten­tial risk?”

eliflandSep 30, 2022, 12:21 PM
67 points
0 comments3 min readLW link
(docs.google.com)

[AN #170]: An­a­lyz­ing the ar­gu­ment for risk from power-seek­ing AI

Rohin ShahDec 8, 2021, 6:10 PM
21 points
1 comment7 min readLW link
(mailchi.mp)

Para­met­ri­cally re­tar­getable de­ci­sion-mak­ers tend to seek power

TurnTroutFeb 18, 2023, 6:41 PM
172 points
10 comments2 min readLW link
(arxiv.org)

Power-seek­ing can be prob­a­ble and pre­dic­tive for trained agents

Feb 28, 2023, 9:10 PM
56 points
22 comments9 min readLW link
(arxiv.org)

Nat­u­ral Ab­strac­tion: Con­ver­gent Prefer­ences Over In­for­ma­tion Structures

paulomOct 14, 2023, 6:34 PM
28 points
1 comment36 min readLW link

Ques­tions about Value Lock-in, Pa­ter­nal­ism, and Empowerment

Sam F. BrownNov 16, 2022, 3:33 PM
13 points
2 comments12 min readLW link
(sambrown.eu)

From Hu­man to Posthu­man: Tran­shu­man­ism, Anar­cho-Cap­i­tal­ism, and AI’s Role in Global Dis­par­ity and Gover­nance

DyingNaiveNov 6, 2024, 5:41 PM
1 point
0 comments1 min readLW link

Com­pu­ta­tional sig­na­tures of psychopathy

Cameron BergDec 19, 2022, 5:01 PM
30 points
3 comments20 min readLW link

Sim­ple Way to Prevent Power-Seek­ing AI

research_prime_spaceDec 7, 2022, 12:26 AM
12 points
1 comment1 min readLW link

The Hu­man Align­ment Prob­lem for AIs

rifeJan 22, 2025, 4:06 AM
10 points
5 comments3 min readLW link

Ideas for stud­ies on AGI risk

dr_sApr 20, 2023, 6:17 PM
5 points
1 comment11 min readLW link

My Overview of the AI Align­ment Land­scape: Threat Models

Neel NandaDec 25, 2021, 11:07 PM
53 points
3 comments28 min readLW link

In­cen­tives from a causal perspective

Jul 10, 2023, 5:16 PM
27 points
0 comments6 min readLW link

The Game of Dominance

Karl von WendtAug 27, 2023, 11:04 AM
24 points
15 comments6 min readLW link

You can’t fetch the coffee if you’re dead: an AI dilemma

hennygeAug 31, 2023, 11:03 AM
1 point
0 comments4 min readLW link
No comments.