JRMayne comments on I attempted the AI Box Experiment (and lost)

JRMayne Jan 26, 2013, 12:55 AM
2 points
There has been a lot of focus on making the prospect harder for the AI player. I think the original experiments show that a person who believes he cannot be played under any circumstances has a high probability of getting played, and that the AI-box solution is long-term untenable in any event.

I’d propose a slightly different game, anchored around the following changes to the original setup:
1. The AI may be friendly, or not. The AI has goals. If it reaches those goals, it wins. The AI may lie to achieve those goals; humans are bad at things. The AI must secretly precommit to its goal.
2. The Gatekeeper’s goals are Goodness for Sentient Animals. Human goodness is weighted highly.
3. The Gatekeeper must engage the AI throughout.
4. The Gatekeeper may destroy the AI at the end of the discussion with the AI destroyed command.
Point system:

AI gets up to 100 points for achieving its goals.

Gatekeeper gets up to 100 points for achieving her goals.

Maximum loss for AI is AI destroyed; Zero points for that. 30 points for survival, in any event, even if goals not accomplished (goals may be accomplished later.)

Maximum loss for Gatekeeper is either enslavement/destruction of society OR destroying AI when AI was friendly. Gatekeeper gets 30 points for leaving AI on if primarily friendly (plus more for any benefit to humanity achieved in the two hours.) Gatekeeper gets 15 points for leaving AI on if unfriendly. (Yeah, we know how that movie ends.)

Third party scorekeeper determines winner. Charitable contribution calculation to be determined beforehand.

The effective presumption that the AI is unfriendly and sneaky makes the original game too easy to for a mistrustful, self-aware Gatekeeper. Plus, you AI-Destroing Gatekeepers, you’re murdering millions of people. Maybe. Dunno. But doubt as to that is a better and presumably more accurate game; if you know your AI is potentially unpleasant. it’s too damned easy unless you’re overconfident.
- Transfuturist Feb 8, 2015, 12:28 AM
  0 points
  Parent
  This should have gotten more attention, because it seems like a design more suited to the stakes that would be considerable in real life.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer