RSS

AI Box­ing (Con­tain­ment)

TagLast edit: Sep 12, 2020, 5:17 AM by habryka

AI Boxing is attempts, experiments, or proposals to isolate (“box”) a powerful AI (~AGI) where it can’t interact with the world at large, save for limited communication with its human liaison. It is often proposed that so long as the AI is physically isolated and restricted, or “boxed”, it will be harmless even if it is an unfriendly artificial intelligence (UAI).

Challenges are: 1) can you successively prevent it from interacting with the world? And 2) can you prevent it from convincing you to let it out?

See also: AI, AGI, Oracle AI, Tool AI, Unfriendly AI

Escaping the box

It is not regarded as likely that an AGI can be boxed in the long term. Since the AGI might be a superintelligence, it could persuade someone (the human liaison, most likely) to free it from its box and thus, human control. Some practical ways of achieving this goal include:

Other, more speculative ways include: threatening to torture millions of conscious copies of you for thousands of years, starting in exactly the same situation as in such a way that it seems overwhelmingly likely that you are a simulation, or it might discover and exploit unknown physics to free itself.

Containing the AGI

Attempts to box an AGI may add some degree of safety to the development of a friendly artificial intelligence (FAI). A number of strategies for keeping an AGI in its box are discussed in Thinking inside the box and Leakproofing the Singularity. Among them are:

Simulations /​ Experiments

The AI Box Experiment is a game meant to explore the possible pitfalls of AI boxing. It is played over text chat, with one human roleplaying as an AI in a box, and another human roleplaying as a gatekeeper with the ability to let the AI out of the box. The AI player wins if they successfully convince the gatekeeper to let them out of the box, and the gatekeeper wins if the AI player has not been freed after a certain period of time.

Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a superintelligence, and been able to convince a human playing a guard to let them out on many—but not all—occasions. Eliezer’s five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin’s 26 experiments had no time limit and subjects he approached.

The text of Eliezer’s experiments have not been made public.

List of experiments

References

That Alien Message

Eliezer YudkowskyMay 22, 2008, 5:55 AM
401 points
176 comments10 min readLW link

Cryp­to­graphic Boxes for Un­friendly AI

paulfchristianoDec 18, 2010, 8:28 AM
76 points
162 comments5 min readLW link

How it feels to have your mind hacked by an AI

blakedJan 12, 2023, 12:33 AM
362 points
222 comments17 min readLW link

The AI in a box boxes you

Stuart_ArmstrongFeb 2, 2010, 10:10 AM
169 points
389 comments1 min readLW link

The case for train­ing fron­tier AIs on Sume­rian-only corpus

Jan 15, 2024, 4:40 PM
130 points
15 comments3 min readLW link

That Alien Mes­sage—The Animation

WriterSep 7, 2024, 2:53 PM
144 points
9 comments8 min readLW link
(youtu.be)

Loose thoughts on AGI risk

YitzJun 23, 2022, 1:02 AM
7 points
3 comments1 min readLW link

AI Align­ment Prize: Su­per-Box­ing

X4vierMar 18, 2018, 1:03 AM
16 points
6 comments6 min readLW link

Thoughts on “Pro­cess-Based Su­per­vi­sion”

Steven ByrnesJul 17, 2023, 2:08 PM
74 points
4 comments23 min readLW link

The Strangest Thing An AI Could Tell You

Eliezer YudkowskyJul 15, 2009, 2:27 AM
137 points
616 comments2 min readLW link

[Question] AI Box Ex­per­i­ment: Are peo­ple still in­ter­ested?

DoubleAug 31, 2022, 3:04 AM
30 points
13 comments1 min readLW link

Box­ing an AI?

tailcalledMar 27, 2015, 2:06 PM
3 points
39 comments1 min readLW link

LOVE in a sim­box is all you need

jacob_cannellSep 28, 2022, 6:25 PM
66 points
73 comments44 min readLW link1 review

[Question] Why isn’t AI con­tain­ment the pri­mary AI safety strat­egy?

OKlogicFeb 5, 2025, 3:54 AM
1 point
3 comments3 min readLW link

I at­tempted the AI Box Ex­per­i­ment (and lost)

TuxedageJan 21, 2013, 2:59 AM
79 points
246 comments5 min readLW link

I at­tempted the AI Box Ex­per­i­ment again! (And won—Twice!)

TuxedageSep 5, 2013, 4:49 AM
79 points
168 comments12 min readLW link

My take on Ja­cob Can­nell’s take on AGI safety

Steven ByrnesNov 28, 2022, 2:01 PM
72 points
15 comments30 min readLW link1 review

[Question] Is keep­ing AI “in the box” dur­ing train­ing enough?

tgbJul 6, 2021, 3:17 PM
7 points
10 comments1 min readLW link

Side-chan­nels: in­put ver­sus output

davidadDec 12, 2022, 12:32 PM
44 points
16 comments2 min readLW link

I wanted to in­ter­view Eliezer Yud­kowsky but he’s busy so I simu­lated him instead

lsusrSep 16, 2021, 7:34 AM
111 points
33 comments5 min readLW link

How To Win The AI Box Ex­per­i­ment (Some­times)

pinkgothicSep 12, 2015, 12:34 PM
56 points
21 comments22 min readLW link

I Am Scared of Post­ing Nega­tive Takes About Bing’s AI

YitzFeb 17, 2023, 8:50 PM
63 points
28 comments1 min readLW link

ARC tests to see if GPT-4 can es­cape hu­man con­trol; GPT-4 failed to do so

Christopher KingMar 15, 2023, 12:29 AM
116 points
22 comments2 min readLW link

[Question] Boxing

Zach Stein-PerlmanAug 2, 2023, 11:38 PM
6 points
1 comment1 min readLW link

[In­tro to brain-like-AGI safety] 11. Safety ≠ al­ign­ment (but they’re close!)

Steven ByrnesApr 6, 2022, 1:39 PM
34 points
1 comment10 min readLW link

How Do We Align an AGI Without Get­ting So­cially Eng­ineered? (Hint: Box It)

Aug 10, 2022, 6:14 PM
28 points
30 comments11 min readLW link

[Question] Why do so many think de­cep­tion in AI is im­por­tant?

PrometheusJan 13, 2024, 8:14 AM
23 points
12 comments1 min readLW link

Mul­ti­ple AIs in boxes, eval­u­at­ing each other’s alignment

Moebius314May 29, 2022, 8:36 AM
8 points
0 comments14 min readLW link

Dreams of Friendliness

Eliezer YudkowskyAug 31, 2008, 1:20 AM
28 points
81 comments9 min readLW link

Su­per­in­tel­li­gence 13: Ca­pa­bil­ity con­trol methods

KatjaGraceDec 9, 2014, 2:00 AM
14 points
48 comments6 min readLW link

Quan­tum AI Box

GurkenglasJun 8, 2018, 4:20 PM
4 points
15 comments1 min readLW link

AI-Box Ex­per­i­ment—The Acausal Trade Argument

XiXiDuJul 8, 2011, 9:18 AM
14 points
20 comments2 min readLW link

Safely and use­fully spec­tat­ing on AIs op­ti­miz­ing over toy worlds

AlexMennenJul 31, 2018, 6:30 PM
24 points
16 comments2 min readLW link

Analysing: Danger­ous mes­sages from fu­ture UFAI via Oracles

Stuart_ArmstrongNov 22, 2019, 2:17 PM
22 points
16 comments4 min readLW link

[Question] Is there a sim­ple pa­ram­e­ter that con­trols hu­man work­ing mem­ory ca­pac­ity, which has been set trag­i­cally low?

LironAug 23, 2019, 10:10 PM
17 points
8 comments1 min readLW link

Self-shut­down AI

jan betleyAug 21, 2023, 4:48 PM
13 points
2 comments2 min readLW link

xkcd on the AI box experiment

FiftyTwoNov 21, 2014, 8:26 AM
28 points
234 comments1 min readLW link

Con­tain­ing the AI… In­side a Si­mu­lated Reality

HumaneAutomationOct 31, 2020, 4:16 PM
1 point
9 comments2 min readLW link

AI box: AI has one shot at avoid­ing de­struc­tion—what might it say?

ancientcampusJan 22, 2013, 8:22 PM
25 points
355 comments1 min readLW link

AI Box Log

DorikkaJan 27, 2012, 4:47 AM
23 points
30 comments23 min readLW link

[Question] Danger(s) of the­o­rem-prov­ing AI?

YitzMar 16, 2022, 2:47 AM
8 points
8 comments1 min readLW link

An AI-in-a-box suc­cess model

azsantoskApr 11, 2022, 10:28 PM
16 points
1 comment10 min readLW link

Another ar­gu­ment that you will let the AI out of the box

Garrett BakerApr 19, 2022, 9:54 PM
8 points
16 comments2 min readLW link

Pivotal acts us­ing an un­al­igned AGI?

Simon FischerAug 21, 2022, 5:13 PM
28 points
3 comments7 min readLW link

Get­ting from an un­al­igned AGI to an al­igned AGI?

Tor Økland BarstadJun 21, 2022, 12:36 PM
13 points
7 comments9 min readLW link

An­thro­po­mor­phic AI and Sand­boxed Vir­tual Uni­verses

jacob_cannellSep 3, 2010, 7:02 PM
4 points
124 comments5 min readLW link

Sand­box­ing by Phys­i­cal Si­mu­la­tion?

moridinamaelAug 1, 2018, 12:36 AM
12 points
4 comments1 min readLW link

Mak­ing it harder for an AGI to “trick” us, with STVs

Tor Økland BarstadJul 9, 2022, 2:42 PM
15 points
5 comments22 min readLW link

Dis­sected boxed AI

Nathan1123Aug 12, 2022, 2:37 AM
−8 points
2 comments1 min readLW link

An Un­canny Prison

Nathan1123Aug 13, 2022, 9:40 PM
3 points
3 comments2 min readLW link

Gate­keeper Vic­tory: AI Box Reflection

Sep 9, 2022, 9:38 PM
6 points
6 comments9 min readLW link

How to Study Un­safe AGI’s safely (and why we might have no choice)

PunoxysmMar 7, 2014, 7:24 AM
10 points
47 comments5 min readLW link

Smoke with­out fire is scary

Adam JermynOct 4, 2022, 9:08 PM
52 points
22 comments4 min readLW link

Another prob­lem with AI con­fine­ment: or­di­nary CPUs can work as ra­dio transmitters

RomanSOct 14, 2022, 8:28 AM
36 points
1 comment1 min readLW link
(news.softpedia.com)

De­ci­sion the­ory does not im­ply that we get to have nice things

So8resOct 18, 2022, 3:04 AM
171 points
73 comments26 min readLW link2 reviews

Pro­saic mis­al­ign­ment from the Solomonoff Predictor

Cleo NardoDec 9, 2022, 5:53 PM
42 points
3 comments5 min readLW link

I’ve up­dated to­wards AI box­ing be­ing sur­pris­ingly easy

Noosphere89Dec 25, 2022, 3:40 PM
8 points
20 comments2 min readLW link

[Question] Or­a­cle AGI—How can it es­cape, other than se­cu­rity is­sues? (Steganog­ra­phy?)

RationalSieveDec 25, 2022, 8:14 PM
3 points
6 comments1 min readLW link

Bing find­ing ways to by­pass Microsoft’s filters with­out be­ing asked. Is it re­pro­ducible?

Christopher KingFeb 20, 2023, 3:11 PM
27 points
15 comments1 min readLW link

[Question] AI box question

KvmanThinkingDec 4, 2024, 7:03 PM
2 points
2 comments1 min readLW link

ChatGPT get­ting out of the box

qbolecMar 16, 2023, 1:47 PM
6 points
3 comments1 min readLW link

Plan­ning to build a cryp­to­graphic box with perfect secrecy

Lysandre TerrisseDec 31, 2023, 9:31 AM
40 points
6 comments11 min readLW link

An AI, a box, and a threat

jwfiredragonMar 7, 2024, 6:15 AM
9 points
0 comments6 min readLW link

Disprov­ing and par­tially fix­ing a fully ho­mo­mor­phic en­cryp­tion scheme with perfect secrecy

Lysandre TerrisseMay 26, 2024, 2:56 PM
16 points
1 comment18 min readLW link

Would catch­ing your AIs try­ing to es­cape con­vince AI de­vel­op­ers to slow down or un­de­ploy?

BuckAug 26, 2024, 4:46 PM
305 points
77 comments4 min readLW link

The Prag­matic Side of Cryp­to­graph­i­cally Box­ing AI

Bart JaworskiAug 6, 2024, 5:46 PM
6 points
0 comments9 min readLW link

Prov­ably Safe AI: Wor­ld­view and Projects

Aug 9, 2024, 11:21 PM
54 points
43 comments7 min readLW link

How to safely use an optimizer

Simon FischerMar 28, 2024, 4:11 PM
47 points
21 comments7 min readLW link

Ideas for stud­ies on AGI risk

dr_sApr 20, 2023, 6:17 PM
5 points
1 comment11 min readLW link

“Don’t even think about hell”

emmabMay 2, 2020, 8:06 AM
6 points
2 comments1 min readLW link

In­for­ma­tion-The­o­retic Box­ing of Superintelligences

Nov 30, 2023, 2:31 PM
30 points
0 comments7 min readLW link

Pro­tect­ing against sud­den ca­pa­bil­ity jumps dur­ing training

Nikola JurkovicDec 2, 2023, 4:22 AM
15 points
2 comments2 min readLW link

Coun­ter­fac­tual Or­a­cles = on­line su­per­vised learn­ing with ran­dom se­lec­tion of train­ing episodes

Wei DaiSep 10, 2019, 8:29 AM
52 points
26 comments3 min readLW link

Epiphe­nom­e­nal Or­a­cles Ig­nore Holes in the Box

SilentCalJan 31, 2018, 8:08 PM
17 points
8 comments2 min readLW link

I played the AI Box Ex­per­i­ment again! (and lost both games)

TuxedageSep 27, 2013, 2:32 AM
62 points
123 comments11 min readLW link

AIs and Gate­keep­ers Unite!

Eliezer YudkowskyOct 9, 2008, 5:04 PM
14 points
163 comments1 min readLW link

Re­sults of $1,000 Or­a­cle con­test!

Stuart_ArmstrongJun 17, 2020, 5:44 PM
60 points
2 comments1 min readLW link

Con­test: $1,000 for good ques­tions to ask to an Or­a­cle AI

Stuart_ArmstrongJul 31, 2019, 6:48 PM
59 points
154 comments3 min readLW link

Or­a­cles, se­quence pre­dic­tors, and self-con­firm­ing predictions

Stuart_ArmstrongMay 3, 2019, 2:09 PM
22 points
0 comments3 min readLW link

Self-con­firm­ing prophe­cies, and sim­plified Or­a­cle designs

Stuart_ArmstrongJun 28, 2019, 9:57 AM
10 points
1 comment5 min readLW link

How to es­cape from your sand­box and from your hard­ware host

PhilGoetzJul 31, 2015, 5:26 PM
43 points
28 comments1 min readLW link

Or­a­cle paper

Stuart_ArmstrongDec 13, 2017, 2:59 PM
12 points
7 comments1 min readLW link

[FICTION] Un­box­ing Ely­sium: An AI’S Escape

Super AGIJun 10, 2023, 4:41 AM
−16 points
4 comments14 min readLW link

Break­ing Or­a­cles: su­per­ra­tional­ity and acausal trade

Stuart_ArmstrongNov 25, 2019, 10:40 AM
25 points
15 comments1 min readLW link

Or­a­cles: re­ject all deals—break su­per­ra­tional­ity, with superrationality

Stuart_ArmstrongDec 5, 2019, 1:51 PM
20 points
4 comments8 min readLW link

A way to make solv­ing al­ign­ment 10.000 times eas­ier. The shorter case for a mas­sive open source sim­box pro­ject.

AlexFromSafeTransitionJun 21, 2023, 8:08 AM
2 points
16 comments14 min readLW link