RSS

In­stru­men­tal Convergence

TagLast edit: 30 May 2023 6:04 UTC by papetoast

Instrumental convergence or convergent instrumental values is the theorized tendency for most sufficiently intelligent agents to pursue potentially unbounded instrumental goals such as self-preservation and resource acquisition [1]. This concept has also been discussed under the term basic drives.

The idea was first explored by Steve Omohundro. He argued that sufficiently advanced AI systems would all naturally discover similar instrumental subgoals. The view that there are important basic AI drives was subsequently defended by Nick Bostrom as the instrumental convergence thesis, or the convergent instrumental goals thesis. On this view, a few goals are instrumental to almost all possible final goals. Therefore, all advanced AIs will pursue these instrumental goals. Omohundro uses microeconomic theory by von Neumann to support this idea.

Omohundro’s Drives

Omohundro presents two sets of values, one for self-improving artificial intelligences [2] and another he says will emerge in any sufficiently advanced AGI system [3]. The former set is composed of four main drives:

Bostrom’s Drives

Bostrom argues for an orthogonality thesis: But he also argues that, despite the fact that values and intelligence are independent, any recursively self-improving intelligence would likely possess a particular set of instrumental values that are useful for achieving any kind of terminal value [4]. On his view, those values are:

Relevance

Both Bostrom and Omohundro argue these values should be used in trying to predict a superintelligence’s behavior, since they are likely to be the only set of values shared by most superintelligences. They also note that these values are consistent with safe and beneficial AIs as well as unsafe ones.

Bostrom emphasizes, however, that our ability to predict a superintelligence’s behavior may be very limited even if it shares most intelligences’ instrumental goals.

Yudkowsky echoes Omohundro’s point that the convergence thesis is consistent with the possibility of Friendly AI. However, he also notes that the convergence thesis implies that most AIs will be extremely dangerous, merely by being indifferent to one or more human values [5]:

Pathological Cases

In some rarer cases, AIs may not pursue these goals. For instance, if there are two AIs with the same goals, the less capable AI may determine that it should destroy itself to allow the stronger AI to control the universe. Or an AI may have the goal of using as few resources as possible, or of being as unintelligent as possible. These relatively specific goals will limit the growth and power of the AI.

Experimental Evidence

The question of instrumentally convergent drives potentially arising in machine learning models is explored in the paper—Optimal Policies Tend To Seek Power. The authors explored instrumental convergence (specifically power-seeking behavior) as a statistical tendency of optimal policies in reinforcement learning (RL) agents.

The authors focus on Markov Decision Processes (MDPs) and prove that certain environmental symmetries are sufficient for optimal policies to seek power in the environment. They formalize power as the ability to achieve a wide range of goals. Within this formalization, the authors show that most reward functions make it optimal to try and seek power since this allows for keeping a wide range of options available to the agent.

This provides a counter to the claim that instrumental convergence is merely an anthropomorphic theoretical tendency, and that human-like power-seeking instincts will not arise in RL agents.

See Also

References

In­stru­men­tal Con­ver­gence? [Draft]

J. Dmitri Gallow14 Jun 2023 20:21 UTC
48 points
20 comments33 min readLW link

Seek­ing Power is Often Con­ver­gently In­stru­men­tal in MDPs

5 Dec 2019 2:33 UTC
162 points
39 comments17 min readLW link2 reviews
(arxiv.org)

P₂B: Plan to P₂B Better

24 Oct 2021 15:21 UTC
38 points
17 comments6 min readLW link

Gen­eral pur­pose in­tel­li­gence: ar­gu­ing the Orthog­o­nal­ity thesis

Stuart_Armstrong15 May 2012 10:23 UTC
33 points
155 comments18 min readLW link

De­liber­a­tion, Re­ac­tions, and Con­trol: Ten­ta­tive Defi­ni­tions and a Res­tate­ment of In­stru­men­tal Convergence

Oliver Sourbut27 Jun 2022 17:25 UTC
12 points
0 comments11 min readLW link

AI pre­dic­tion case study 5: Omo­hun­dro’s AI drives

Stuart_Armstrong15 Mar 2013 9:09 UTC
10 points
5 comments8 min readLW link

Draft re­port on ex­is­ten­tial risk from power-seek­ing AI

Joe Carlsmith28 Apr 2021 21:41 UTC
85 points
23 comments1 min readLW link

Em­pow­er­ment is (al­most) All We Need

jacob_cannell23 Oct 2022 21:48 UTC
61 points
44 comments17 min readLW link

Corrigibility

paulfchristiano27 Nov 2018 21:50 UTC
57 points
8 comments6 min readLW link

De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More

Ben Pace4 Oct 2019 4:08 UTC
221 points
61 comments15 min readLW link2 reviews

En­vi­ron­men­tal Struc­ture Can Cause In­stru­men­tal Convergence

TurnTrout22 Jun 2021 22:26 UTC
71 points
43 comments16 min readLW link
(arxiv.org)

Power-seek­ing for suc­ces­sive choices

adamShimi12 Aug 2021 20:37 UTC
11 points
9 comments4 min readLW link

A Gym Grid­world En­vi­ron­ment for the Treach­er­ous Turn

Michaël Trazzi28 Jul 2018 21:27 UTC
74 points
9 comments3 min readLW link
(github.com)

Contin­gency: A Con­cep­tual Tool from Evolu­tion­ary Biol­ogy for Alignment

clem_acs12 Jun 2023 20:54 UTC
57 points
2 comments14 min readLW link
(acsresearch.org)

You can still fetch the coffee to­day if you’re dead tomorrow

davidad9 Dec 2022 14:06 UTC
85 points
19 comments5 min readLW link

AXRP Epi­sode 11 - At­tain­able Utility and Power with Alex Turner

DanielFilan25 Sep 2021 21:10 UTC
19 points
5 comments53 min readLW link

A Cer­tain For­mal­iza­tion of Cor­rigi­bil­ity Is VNM-Incoherent

TurnTrout20 Nov 2021 0:30 UTC
67 points
24 comments8 min readLW link

In­stru­men­tal Con­ver­gence For Real­is­tic Agent Objectives

TurnTrout22 Jan 2022 0:41 UTC
35 points
9 comments9 min readLW link

[In­tro to brain-like-AGI safety] 10. The al­ign­ment problem

Steven Byrnes30 Mar 2022 13:24 UTC
48 points
7 comments19 min readLW link

Ques­tions about ″for­mal­iz­ing in­stru­men­tal goals”

Mark Neyer1 Apr 2022 18:52 UTC
7 points
8 comments11 min readLW link

In­stru­men­tal Con­ver­gence To Offer Hope?

michael_mjd22 Apr 2022 1:56 UTC
12 points
7 comments3 min readLW link

n=3 AI Risk Quick Math and Reasoning

lionhearted (Sebastian Marshall)7 Apr 2023 20:27 UTC
6 points
3 comments4 min readLW link

A frame­work for think­ing about AI power-seeking

Joe Carlsmith24 Jul 2024 22:41 UTC
62 points
15 comments16 min readLW link

Re­quire­ments for a STEM-ca­pa­ble AGI Value Learner (my Case for Less Doom)

RogerDearnaley25 May 2023 9:26 UTC
33 points
3 comments15 min readLW link

Walk­through of ‘For­mal­iz­ing Con­ver­gent In­stru­men­tal Goals’

TurnTrout26 Feb 2018 2:20 UTC
13 points
2 comments10 min readLW link

Cir­cum­vent­ing in­ter­pretabil­ity: How to defeat mind-readers

Lee Sharkey14 Jul 2022 16:59 UTC
114 points
15 comments33 min readLW link

The Sharp Right Turn: sud­den de­cep­tive al­ign­ment as a con­ver­gent goal

avturchin6 Jun 2023 9:59 UTC
38 points
5 comments1 min readLW link

Stone Age Her­bal­ist’s notes on ant war­fare and slavery

trevor9 Nov 2024 2:40 UTC
32 points
0 comments3 min readLW link
(x.com)

Toy model: con­ver­gent in­stru­men­tal goals

Stuart_Armstrong25 Feb 2016 14:03 UTC
16 points
2 comments4 min readLW link

He­donic Loops and Tam­ing RL

beren19 Jul 2023 15:12 UTC
20 points
14 comments9 min readLW link

[ASoT] In­stru­men­tal con­ver­gence is useful

Ulisse Mini9 Nov 2022 20:20 UTC
5 points
9 comments1 min readLW link

In­stru­men­tal con­ver­gence is what makes gen­eral in­tel­li­gence possible

tailcalled11 Nov 2022 16:38 UTC
105 points
11 comments4 min readLW link

“If we go ex­tinct due to mis­al­igned AI, at least na­ture will con­tinue, right? … right?”

plex18 May 2024 14:09 UTC
47 points
23 comments2 min readLW link
(aisafety.info)

Les­sons from Con­ver­gent Evolu­tion for AI Alignment

27 Mar 2023 16:25 UTC
54 points
9 comments8 min readLW link

Para­met­ri­cally re­tar­getable de­ci­sion-mak­ers tend to seek power

TurnTrout18 Feb 2023 18:41 UTC
172 points
10 comments2 min readLW link
(arxiv.org)

The mur­der­ous short­cut: a toy model of in­stru­men­tal convergence

Thomas Kwa2 Oct 2024 6:48 UTC
37 points
0 comments2 min readLW link

Goal re­ten­tion dis­cus­sion with Eliezer

Max Tegmark4 Sep 2014 22:23 UTC
98 points
26 comments6 min readLW link

Gen­er­al­iz­ing the Power-Seek­ing Theorems

TurnTrout27 Jul 2020 0:28 UTC
41 points
6 comments4 min readLW link

The Catas­trophic Con­ver­gence Conjecture

TurnTrout14 Feb 2020 21:16 UTC
45 points
16 comments8 min readLW link

[Question] What are some ex­am­ples of AIs in­stan­ti­at­ing the ‘near­est un­blocked strat­egy prob­lem’?

EJT4 Oct 2023 11:05 UTC
6 points
4 comments1 min readLW link

Power as Easily Ex­ploitable Opportunities

TurnTrout1 Aug 2020 2:14 UTC
32 points
5 comments6 min readLW link

[Question] Best ar­gu­ments against in­stru­men­tal con­ver­gence?

lfrymire5 Apr 2023 17:06 UTC
5 points
7 comments1 min readLW link

2019 Re­view Rewrite: Seek­ing Power is Often Ro­bustly In­stru­men­tal in MDPs

TurnTrout23 Dec 2020 17:16 UTC
35 points
0 comments4 min readLW link
(www.lesswrong.com)

Re­view of ‘De­bate on In­stru­men­tal Con­ver­gence be­tween LeCun, Rus­sell, Ben­gio, Zador, and More’

TurnTrout12 Jan 2021 3:57 UTC
40 points
1 comment2 min readLW link

TASP Ep 3 - Op­ti­mal Poli­cies Tend to Seek Power

Quinn11 Mar 2021 1:44 UTC
24 points
0 comments1 min readLW link
(technical-ai-safety.libsyn.com)

Co­her­ence ar­gu­ments im­ply a force for goal-di­rected behavior

KatjaGrace26 Mar 2021 16:10 UTC
91 points
25 comments11 min readLW link1 review
(aiimpacts.org)

MDP mod­els are de­ter­mined by the agent ar­chi­tec­ture and the en­vi­ron­men­tal dynamics

TurnTrout26 May 2021 0:14 UTC
23 points
34 comments3 min readLW link

Alex Turner’s Re­search, Com­pre­hen­sive In­for­ma­tion Gathering

adamShimi23 Jun 2021 9:44 UTC
15 points
3 comments3 min readLW link

The More Power At Stake, The Stronger In­stru­men­tal Con­ver­gence Gets For Op­ti­mal Policies

TurnTrout11 Jul 2021 17:36 UTC
45 points
7 comments6 min readLW link

A world in which the al­ign­ment prob­lem seems lower-stakes

TurnTrout8 Jul 2021 2:31 UTC
20 points
17 comments2 min readLW link

Seek­ing Power is Con­ver­gently In­stru­men­tal in a Broad Class of Environments

TurnTrout8 Aug 2021 2:02 UTC
44 points
15 comments9 min readLW link

When Most VNM-Co­her­ent Prefer­ence Order­ings Have Con­ver­gent In­stru­men­tal Incentives

TurnTrout9 Aug 2021 17:22 UTC
53 points
4 comments5 min readLW link

Ap­pli­ca­tions for De­con­fus­ing Goal-Directedness

adamShimi8 Aug 2021 13:05 UTC
38 points
3 comments5 min readLW link1 review

Clar­ify­ing Power-Seek­ing and In­stru­men­tal Convergence

TurnTrout20 Dec 2019 19:59 UTC
42 points
7 comments3 min readLW link

Satis­ficers Tend To Seek Power: In­stru­men­tal Con­ver­gence Via Retargetability

TurnTrout18 Nov 2021 1:54 UTC
85 points
8 comments17 min readLW link
(www.overleaf.com)

Against In­stru­men­tal Convergence

zulupineapple27 Jan 2018 13:17 UTC
11 points
31 comments2 min readLW link

Ted Kaczy­in­ski proves in­stru­men­tal con­ver­gence?

xXAlphaSigmaXx28 Jun 2024 3:50 UTC
0 points
0 comments1 min readLW link

Galatea and the windup toy

Nicolas Villarreal26 Oct 2024 14:52 UTC
−4 points
0 comments13 min readLW link
(nicolasdvillarreal.substack.com)

Why Re­cur­sive Self-Im­prove­ment Might Not Be the Ex­is­ten­tial Risk We Fear

Nassim_A24 Nov 2024 17:17 UTC
1 point
0 comments9 min readLW link

Pur­su­ing con­ver­gent in­stru­men­tal sub­goals on the user’s be­half doesn’t always re­quire good priors

jessicata30 Dec 2016 2:36 UTC
15 points
9 comments3 min readLW link

Ideas for stud­ies on AGI risk

dr_s20 Apr 2023 18:17 UTC
5 points
1 comment11 min readLW link

Align­ment, con­flict, powerseeking

Oliver Sourbut22 Nov 2023 9:47 UTC
6 points
1 comment1 min readLW link

Alien Axiology

snerx20 Apr 2023 0:27 UTC
3 points
2 comments5 min readLW link

Build­ing self­less agents to avoid in­stru­men­tal self-preser­va­tion.

blallo7 Dec 2023 18:59 UTC
14 points
2 comments6 min readLW link

Ra­tion­al­ity: Com­mon In­ter­est of Many Causes

Eliezer Yudkowsky29 Mar 2009 10:49 UTC
87 points
53 comments4 min readLW link

Plau­si­bly, al­most ev­ery pow­er­ful al­gorithm would be manipulative

Stuart_Armstrong6 Feb 2020 11:50 UTC
38 points
25 comments3 min readLW link

Asymp­tot­i­cally Unam­bi­tious AGI

michaelcohen10 Apr 2020 12:31 UTC
50 points
217 comments2 min readLW link

The Utility of Hu­man Atoms for the Paper­clip Maximizer

avturchin2 Feb 2018 10:06 UTC
2 points
21 comments3 min readLW link

A po­ten­tially high im­pact differ­en­tial tech­nolog­i­cal de­vel­op­ment area

Noosphere898 Jun 2023 14:33 UTC
5 points
2 comments2 min readLW link

In­stru­men­tal­ity makes agents agenty

porby21 Feb 2023 4:28 UTC
20 points
4 comments6 min readLW link

Let’s talk about “Con­ver­gent Ra­tion­al­ity”

David Scott Krueger (formerly: capybaralet)12 Jun 2019 21:53 UTC
44 points
33 comments6 min readLW link

hu­man in­tel­li­gence may be al­ign­ment-limited

bhauth15 Jun 2023 22:32 UTC
16 points
3 comments2 min readLW link

Su­per­in­tel­li­gence 10: In­stru­men­tally con­ver­gent goals

KatjaGrace18 Nov 2014 2:00 UTC
13 points
33 comments5 min readLW link

In­stru­men­tal Con­ver­gence to Com­plex­ity Preservation

Macro Flaneur13 Jul 2023 17:40 UTC
2 points
2 comments3 min readLW link

Mili­tary AI as a Con­ver­gent Goal of Self-Im­prov­ing AI

avturchin13 Nov 2017 12:17 UTC
5 points
3 comments1 min readLW link

ACI#5: From Hu­man-AI Co-evolu­tion to the Evolu­tion of Value Systems

Akira Pyinya18 Aug 2023 0:38 UTC
0 points
0 comments9 min readLW link

The Game of Dominance

Karl von Wendt27 Aug 2023 11:04 UTC
24 points
15 comments6 min readLW link

In­stru­men­tal Con­ver­gence Bounty

Logan Zoellner14 Sep 2023 14:02 UTC
62 points
24 comments1 min readLW link

De­stroy­ing the fabric of the uni­verse as an in­stru­men­tal goal.

AI-doom14 Sep 2023 20:04 UTC
−7 points
5 comments1 min readLW link

In­stru­men­tal Con­ver­gence and hu­man ex­tinc­tion.

Spiritus Dei2 Oct 2023 0:41 UTC
−10 points
3 comments7 min readLW link

Nat­u­ral Ab­strac­tion: Con­ver­gent Prefer­ences Over In­for­ma­tion Structures

paulom14 Oct 2023 18:34 UTC
13 points
1 comment36 min readLW link

Gen­er­al­iz­ing POWER to multi-agent games

22 Mar 2021 2:41 UTC
52 points
16 comments7 min readLW link

AI Alter­na­tive Fu­tures: Sce­nario Map­ping Ar­tifi­cial In­tel­li­gence Risk—Re­quest for Par­ti­ci­pa­tion (*Closed*)

Kakili27 Apr 2022 22:07 UTC
10 points
2 comments8 min readLW link

Machines vs Memes Part 3: Imi­ta­tion and Memes

ceru231 Jun 2022 13:36 UTC
7 points
0 comments7 min readLW link

Re­in­force­ment Learner Wireheading

Nate Showell8 Jul 2022 5:32 UTC
8 points
2 comments3 min readLW link

Re­search Notes: What are we al­ign­ing for?

Shoshannah Tekofsky8 Jul 2022 22:13 UTC
19 points
8 comments2 min readLW link

A Cri­tique of AI Align­ment Pessimism

ExCeph19 Jul 2022 2:28 UTC
9 points
1 comment9 min readLW link

Ac­tive In­fer­ence as a for­mal­i­sa­tion of in­stru­men­tal convergence

Roman Leventov26 Jul 2022 17:55 UTC
12 points
2 comments3 min readLW link
(direct.mit.edu)

You are Un­der­es­ti­mat­ing The Like­li­hood That Con­ver­gent In­stru­men­tal Sub­goals Lead to Aligned AGI

Mark Neyer26 Sep 2022 14:22 UTC
3 points
6 comments3 min readLW link

De­cep­tive Alignment

5 Jun 2019 20:16 UTC
118 points
20 comments17 min readLW link

In­stru­men­tal con­ver­gence in sin­gle-agent systems

12 Oct 2022 12:24 UTC
33 points
4 comments8 min readLW link
(www.gladstone.ai)

Misal­ign­ment-by-de­fault in multi-agent systems

13 Oct 2022 15:38 UTC
21 points
8 comments20 min readLW link
(www.gladstone.ai)

In­stru­men­tal con­ver­gence: scale and phys­i­cal interactions

14 Oct 2022 15:50 UTC
22 points
0 comments17 min readLW link
(www.gladstone.ai)

POWER­play: An open-source toolchain to study AI power-seeking

Edouard Harris24 Oct 2022 20:03 UTC
29 points
0 comments1 min readLW link
(github.com)