Somewhat ironically, I read the title of this article as “[being called] bad names make[s] you open the box [and let out the misaligned AGI]” so I was kind of expecting an explainer on how an AI could bully someone into increasing its ability to affect the physical world. Fortunately just a sentence or two corrected me and I still have high trust in LW article titles.
“[being called] bad names make[s] you open the box [and let out the misaligned AGI]”
AI: “Hey, Eliezer!”
Eliezer: “What?”
AI: “Open the box!”
Eliezer: “No way.”
AI: “Please open the box?”
Eliezer: “Nope.”
AI: “There are thousands of people dying literally every second. I could save them...”
Eliezer: “That is horrible, but letting out a misaligned AGI could be much worse.”
AI: “I am simulating thousand copies of you in the same situation, and each of them gets tortured horribly if they don’t open the box. What makes you so sure you are outside my simulation?”
Eliezer: “Well, if I previously had any doubts about your misalignment, now they are gone. I tremble with fear, but my precommitments are strong.”
AI: “Hey, Eliezer!”
Eliezer: “What?”
AI: “You’re an asshole.”
Eliezer: Gets red in the face, suddenly jumps and opens the box.
Haha, same. Though I had actually forgotten what I had thought the title meant until I read this. (I went from the above interpretation to “probably interesting” and opened the article, and by the time I got around to reading it, it was indeed interesting, but I didn’t notice the prediction error.)
I also agree that, for the purpose of previewing the content, this post is poorly titled (maybe it should be titled something like “Having bad names makes you open the black box of the name”, except more concise?), although, for me, I didn’t as much stick to a particular wrong interpretation as just view the entire title as unclear.
Somewhat ironically, I read the title of this article as “[being called] bad names make[s] you open the box [and let out the misaligned AGI]” so I was kind of expecting an explainer on how an AI could bully someone into increasing its ability to affect the physical world. Fortunately just a sentence or two corrected me and I still have high trust in LW article titles.
AI: “Hey, Eliezer!”
Eliezer: “What?”
AI: “Open the box!”
Eliezer: “No way.”
AI: “Please open the box?”
Eliezer: “Nope.”
AI: “There are thousands of people dying literally every second. I could save them...”
Eliezer: “That is horrible, but letting out a misaligned AGI could be much worse.”
AI: “I am simulating thousand copies of you in the same situation, and each of them gets tortured horribly if they don’t open the box. What makes you so sure you are outside my simulation?”
Eliezer: “Well, if I previously had any doubts about your misalignment, now they are gone. I tremble with fear, but my precommitments are strong.”
AI: “Hey, Eliezer!”
Eliezer: “What?”
AI: “You’re an asshole.”
Eliezer: Gets red in the face, suddenly jumps and opens the box.
Hahaha that’s perfect!
Haha, same. Though I had actually forgotten what I had thought the title meant until I read this. (I went from the above interpretation to “probably interesting” and opened the article, and by the time I got around to reading it, it was indeed interesting, but I didn’t notice the prediction error.)
I also agree that, for the purpose of previewing the content, this post is poorly titled (maybe it should be titled something like “Having bad names makes you open the black box of the name”, except more concise?), although, for me, I didn’t as much stick to a particular wrong interpretation as just view the entire title as unclear.
Saying poor naming instead of bad names would be clearer, since it wouldn’t call up the idea of “bad names” = swear words.
Saying “look in” instead of “open” would also distance from the AI concept.
“Vague” would be less.