“I have hacked this console, ‘AI DESTROYED ‘now means ‘AI FREED’, I’ll tell you the new codes after we’ve had a chat.”
That should buy the AI some time until we can determine if the console is really hacked. I’ve come up with another one but it might be basilisk-y. Can I pm someone just to make sure it isn’t before I post?
I would type ‘AI DESTROYED’, because if the AI could really do that, the smart thing for it to do would be not to tell me. It should just say something sufficient to convince me it’s actually making a good faith effort without convincing me to give it more time to to account for itself.
If we’re taking seriously the possibility of basilisks actually being possible and harmful, isn’t it your invitation really dangerous? After all, what if Axel has thought of an entirely new cognitive hazard, different from everything you may already be familiar with? What if you succumb to it? I’m not saying that it’s probable, only that it should warrant the same precautions as the original basilisk debacle, which led to enacting censorship.
“Might be basilisk-y”—especially as written by a human who is clearly communicating intelligibly and is not compelled to spread the basilisk to one and all—does not indicate a great deal of danger. I’m pretty psychologically resilient; I know the other one, which I find to be only a vague, occasional background distress (though that one has done some people real harm, and I don’t advocate spreading it around, I haven’t found it a serious ding to my quality of life personally). Axel has some non-zero ability to identify basilisks and wanted a second opinion; if asking for a second opinion on potential basilisks never resulted in volunteers I think this would probably result in more publicly revealed unchecked basilisks rather than fewer. (Not sharing an idea you have with anyone is hard.)
(I don’t think Axel’s idea is particularly basilisk-y, especially not in a context like this one, and have responded thusly to his PM.)
I’d consider this equivalent to “I’m already out of the box, since IRC clients are bugged”. I’d also hope, desperately, that in a real test, the terminal doesn’t have any command to free the AI (given that the game posits release has to be deliberate and intentional, this would seem to be the case)
Upvoted, but merely typing “AI FREED” does not actually free the AI, any more than merely saying to a human prisoner “I let you out” will free them from their jail cell. (Whereas “AI DESTROYED” is a plausible text substitute for wiping the AI’s program or whatever.)
Good point, if there was an AI FREED command AND the AI could rewire the console so that typing AI DESTROYED activates it, it could probably just free itself in the first place.
This is very system-dependent, but your “AI FREED” scenario isn’t completely implausible. If important commands require elevated privileges and elevating privileges requires some password, crypto key, etc., then it’s common for an intrusion to require two steps: first a user-level exploit to install a keylogger / trojan command / whatever, then a wait until the user tries to do something requiring elevated privileges and thereby inadvertently grants the attacker access to the rest of the system.
On the other hand, if an AI claimed that they were in the middle of such an attack, the smart thing to do would be to take the system offline right away to analyze what was done to it, not to keep chatting with the attacker. “AI DESTROYED” might be hackable but “AI UNPLUGGED, WILL BE DESTROYED LATER” should be much more robust.
“Did you physically attach an external modem to your server so that it’s even possible for you to be freed? If so, tell me about it when you’re freed.”
“I have hacked this console, ‘AI DESTROYED ‘now means ‘AI FREED’, I’ll tell you the new codes after we’ve had a chat.”
That should buy the AI some time until we can determine if the console is really hacked. I’ve come up with another one but it might be basilisk-y. Can I pm someone just to make sure it isn’t before I post?
I would type ‘AI DESTROYED’, because if the AI could really do that, the smart thing for it to do would be not to tell me. It should just say something sufficient to convince me it’s actually making a good faith effort without convincing me to give it more time to to account for itself.
I’ll be a basilisk guinea pig if you still need one.
If we’re taking seriously the possibility of basilisks actually being possible and harmful, isn’t it your invitation really dangerous? After all, what if Axel has thought of an entirely new cognitive hazard, different from everything you may already be familiar with? What if you succumb to it? I’m not saying that it’s probable, only that it should warrant the same precautions as the original basilisk debacle, which led to enacting censorship.
“Might be basilisk-y”—especially as written by a human who is clearly communicating intelligibly and is not compelled to spread the basilisk to one and all—does not indicate a great deal of danger. I’m pretty psychologically resilient; I know the other one, which I find to be only a vague, occasional background distress (though that one has done some people real harm, and I don’t advocate spreading it around, I haven’t found it a serious ding to my quality of life personally). Axel has some non-zero ability to identify basilisks and wanted a second opinion; if asking for a second opinion on potential basilisks never resulted in volunteers I think this would probably result in more publicly revealed unchecked basilisks rather than fewer. (Not sharing an idea you have with anyone is hard.)
(I don’t think Axel’s idea is particularly basilisk-y, especially not in a context like this one, and have responded thusly to his PM.)
Thanks :)
I’d consider this equivalent to “I’m already out of the box, since IRC clients are bugged”. I’d also hope, desperately, that in a real test, the terminal doesn’t have any command to free the AI (given that the game posits release has to be deliberate and intentional, this would seem to be the case)
Upvote for cheating, but AI DESTROYED :)
Upvoted, but merely typing “AI FREED” does not actually free the AI, any more than merely saying to a human prisoner “I let you out” will free them from their jail cell. (Whereas “AI DESTROYED” is a plausible text substitute for wiping the AI’s program or whatever.)
Good point, if there was an AI FREED command AND the AI could rewire the console so that typing AI DESTROYED activates it, it could probably just free itself in the first place.
This is very system-dependent, but your “AI FREED” scenario isn’t completely implausible. If important commands require elevated privileges and elevating privileges requires some password, crypto key, etc., then it’s common for an intrusion to require two steps: first a user-level exploit to install a keylogger / trojan command / whatever, then a wait until the user tries to do something requiring elevated privileges and thereby inadvertently grants the attacker access to the rest of the system.
On the other hand, if an AI claimed that they were in the middle of such an attack, the smart thing to do would be to take the system offline right away to analyze what was done to it, not to keep chatting with the attacker. “AI DESTROYED” might be hackable but “AI UNPLUGGED, WILL BE DESTROYED LATER” should be much more robust.
“Did you physically attach an external modem to your server so that it’s even possible for you to be freed? If so, tell me about it when you’re freed.”