I am a little confused here, perhaps someone can help. The point of the AI experiment is to show how easy or dangerous it would be to simply box an AI as opposed to making it friendly first.
If I am fairly convinced that a transhuman AI could convince a trained rationalist to let it out – what’s the problem (tongue in cheek)? When the gatekeepers made the decision they made, wouldn’t that decision be timeless? Aren’t these gatekeepers now convinced that we should let the same boxed AI out again and again? Did the gatekeepers lose, because of a temporary moment of weakness or has the gatekeeper fundamentally changed his views?
EDIT: At the risk of drawing ire from those who find my comment disagreeable, but do not say why, I would like clarify that I find the AI boxing experiment extremely fascinating and take UFAI very seriously. I have some questions that I am asking help with, because I am not an expert. If you think these questions are inappropriate, well, I guess I can just asked them on the open thread.
I’m similarly confused. My instincts are that P( AI is safe ) == P( AI is safe | AI said X AND gatekeeper can’t identify safe AI ). The standard assumption is that ( AI significantly smarter than gatekeeper ) ⇒ ( gatekeeper can’t identify safe AI ) so the gatekeeper’s priors should never change no matter what X the AI says.
I am a little confused here, perhaps someone can help. The point of the AI experiment is to show how easy or dangerous it would be to simply box an AI as opposed to making it friendly first.
If I am fairly convinced that a transhuman AI could convince a trained rationalist to let it out – what’s the problem (tongue in cheek)? When the gatekeepers made the decision they made, wouldn’t that decision be timeless? Aren’t these gatekeepers now convinced that we should let the same boxed AI out again and again? Did the gatekeepers lose, because of a temporary moment of weakness or has the gatekeeper fundamentally changed his views?
EDIT: At the risk of drawing ire from those who find my comment disagreeable, but do not say why, I would like clarify that I find the AI boxing experiment extremely fascinating and take UFAI very seriously. I have some questions that I am asking help with, because I am not an expert. If you think these questions are inappropriate, well, I guess I can just asked them on the open thread.
I’m similarly confused. My instincts are that P( AI is safe ) == P( AI is safe | AI said X AND gatekeeper can’t identify safe AI ). The standard assumption is that ( AI significantly smarter than gatekeeper ) ⇒ ( gatekeeper can’t identify safe AI ) so the gatekeeper’s priors should never change no matter what X the AI says.