Now what I really want to see is an AI-box experiment where the Gatekeeper wins early by convincing the AI to become Friendly.
That’s hard to check. However, there was a game where the gatekeeper convinced the AI to remain in the box.
However, there was a game where the gatekeeper convinced the AI to remain in the box.
I did that! I mentioned that in this post:
http://lesswrong.com/lw/iqk/i_played_the_ai_box_experiment_again_and_lost/9thk
That’s hard to check. However, there was a game where the gatekeeper convinced the AI to remain in the box.
I did that! I mentioned that in this post:
http://lesswrong.com/lw/iqk/i_played_the_ai_box_experiment_again_and_lost/9thk