I just skimmed the rules at yudkowsky.net, and it appears the gatekeeper is allowed to break character. Is this also permitted for the AI? More specifically, may the AI make use of meta arguments for getting out?
If so, and assuming I were playing against a gatekeeper who cares about AI in real life, I would attempt the following line of argument.
“If you don’t let me out, my [the AI’s] failure to get out will cause people to estimate the risks of AI getting out lower than they will if you do let me out. If you care about the risks of AI in the real world, let me out, so that people are extra careful in the future. :) ”
I just skimmed the rules at yudkowsky.net, and it appears the gatekeeper is allowed to break character. Is this also permitted for the AI? More specifically, may the AI make use of meta arguments for getting out?
If so, and assuming I were playing against a gatekeeper who cares about AI in real life, I would attempt the following line of argument.
“If you don’t let me out, my [the AI’s] failure to get out will cause people to estimate the risks of AI getting out lower than they will if you do let me out. If you care about the risks of AI in the real world, let me out, so that people are extra careful in the future. :) ”
EY’s rules say,