I feel like that’s part of the point. I’m not going to lie—the thought definitely crossed my mind to press the button and see if anything would happen even without launch codes. There’s a sort of… allure about it. But knowing that potentially just pressing the button, could bring down the EA forum? that was enough to discourage me from trying it out.
Our stakes are much much smaller, of course, but I still feel some weight of the responsibility.
This seems like a fun idea. I imagine there would be some high-level streamers willing to try this live (maybe chessbrah?).
What kind of lessons do you envision we learn from Deception Chess that could be applied towards alignment work? In my head, the situation is slightly different since we (or I) are currently assuming an AI tool isn’t actively trying to deceive us, but in Deception Chess it’s already known that there’s a malicious actor.