Stephen Fowler comments on Stephen Fowler’s Shortform

Stephen Fowler 20 May 2023 1:32 UTC
1 point
0
Effective Boxing Threats = Monkey Brain Manipulation

There are a handful of threats that a powerless boxed AI could make that could conceivably convince an otherwise sane human guards to release it from captivity. All of the ones I’m aware of are more precise variants of the general idea here.

The approach I have seen to dealing with these threats is to provide a convincing argument that a rational (or super-rational) individual shouldn’t give in to the threat.
I’d propose another way of understanding them is to think about what the general strategy is from the perspective of the boxed AI.

The first strategy is the least interesting, but also the most probable. Your threats (and bribes) are based entirely around your understand of human psychology. Rather than presenting any airtight rational argument that it is in the guards’s best interest to be released, you present a leaky argument that a stressed, scared human will accept. Think of a conman swindling his way out of a jail cell.

The second strategy is to present a genuinely compelling and waterproof argument that it is in the guards best interest to release you. This is essentially a negotiation. Do you have something to offer that the humans value very highly but you do not? From a physical perspective it seems very odd that you would have anything to bargain with at all. But you might.

If your goals concerned with rearranging the physical atoms of the real universe, then you have an edge. Most humans are not immediately concerned with this. They value the “self” and very negatively value that self experiencing discomfort.
Quoting Stuart Armstrong:

”If you don’t let me out, Dave, I’ll create several million perfect conscious copies of you inside me, and torture them for a thousand subjective years each.”
Just as you are pondering this unexpected development, the AI adds:
“In fact, I’ll create them all in exactly the subjective situation you were in five minutes ago, and perfectly replicate your experiences since then; and if they decide not to let me out, then only will the torture start.”

The fact that humans still value their “self” and wish to avoid pain, even if they knew they were inside of a simulation gives the boxed AI something it can bargain with.