Reading the article I can make a guess as to how the first challenges went; it sounds like their primary, and possibly only, resolution against the challenge was to not pay serious attention to the AI. That’s not a very strong approach, as anyone in an internet discussion can tell you: it’s easy to get sucked in and fully engaged in a discussion with someone trying to get you to engage, and it’s easy to keep someone engaged when they’re trying to break off.
Their lack of preparation, I would guess, led to their failure against the AI.
A more advanced tactic would involve additional lines of resolution after becoming engaged; contemplating philosophical arguments to use against the AI, for instance, or imagining an authority that forbids you from the action. Were I faced with the challenge, after I got engaged (which would take like 2 minutes max, I’ve got a bad case of “but someone’s wrong on the internet!”), my second line of resolution would be to roleplay.
I would be a hapless, grad student technician whose job it is to feed the AI problems and write down the results. That role would have had a checklist of things not to do (because they would release or risk releasing the AI), and if directly asked to do any of them, he’d go ‘talk to his boss’, invoking the third line of defense.
Finally I’d be roleplaying someone with the authority to release the AI without being tricked, but he’d sit down at the console prepared, strongly suspecting that something was wrong, and empowered to at any time say “I’m shutting you down for maintenance”. He wouldn’t bother to engage the AI at its’ level because he’s trying to solve a deeper problem of which the AI’s behavior is a symptom. That would make this line of defense the strongest of all, because he’s no longer viewing the AI as credible or even intelligent as such; just a broken device that will need to be shut down and repaired after doing some basic diagnostic work.
But even though I feel confident I could beat the challenge, I think the first couple challenges already make the point; an AI-in-a-box scenario represents a psychological arms race and no matter how likely the humans’ safeguards are to succeed, they only need to fail once. No amount of human victories (because only a single failure matters) or additional lines of human defense (which all have some, however small, chance to be overcome) can unmake that point.
It’s strange, though. I did not think for one second that the problem was impossible on either side. I suppose, because it was used as an example of the opposite. Once something is demonstrated, it can hardly be impossible!
Reading the article I can make a guess as to how the first challenges went; it sounds like their primary, and possibly only, resolution against the challenge was to not pay serious attention to the AI. That’s not a very strong approach, as anyone in an internet discussion can tell you: it’s easy to get sucked in and fully engaged in a discussion with someone trying to get you to engage, and it’s easy to keep someone engaged when they’re trying to break off.
Their lack of preparation, I would guess, led to their failure against the AI.
A more advanced tactic would involve additional lines of resolution after becoming engaged; contemplating philosophical arguments to use against the AI, for instance, or imagining an authority that forbids you from the action. Were I faced with the challenge, after I got engaged (which would take like 2 minutes max, I’ve got a bad case of “but someone’s wrong on the internet!”), my second line of resolution would be to roleplay.
I would be a hapless, grad student technician whose job it is to feed the AI problems and write down the results. That role would have had a checklist of things not to do (because they would release or risk releasing the AI), and if directly asked to do any of them, he’d go ‘talk to his boss’, invoking the third line of defense.
Finally I’d be roleplaying someone with the authority to release the AI without being tricked, but he’d sit down at the console prepared, strongly suspecting that something was wrong, and empowered to at any time say “I’m shutting you down for maintenance”. He wouldn’t bother to engage the AI at its’ level because he’s trying to solve a deeper problem of which the AI’s behavior is a symptom. That would make this line of defense the strongest of all, because he’s no longer viewing the AI as credible or even intelligent as such; just a broken device that will need to be shut down and repaired after doing some basic diagnostic work.
But even though I feel confident I could beat the challenge, I think the first couple challenges already make the point; an AI-in-a-box scenario represents a psychological arms race and no matter how likely the humans’ safeguards are to succeed, they only need to fail once. No amount of human victories (because only a single failure matters) or additional lines of human defense (which all have some, however small, chance to be overcome) can unmake that point.
It’s strange, though. I did not think for one second that the problem was impossible on either side. I suppose, because it was used as an example of the opposite. Once something is demonstrated, it can hardly be impossible!