petrov_day_admin_account September 26, 2020 11:26 AM
Hello Chris_Leong,
You are part of a smaller group of 30 users who has been selected for the second part of this experiment. In order for the website not to go down, at least 5 of these selected users must enter their codes within 30 minutes of receiving this message, and at least 20 of these users must enter their codes within 6 hours of receiving the message. To keep the site up, please enter your codes as soon as possible. You will be asked to complete a short survey afterwards.
I think the lesson is that if you decide to launch the nukes it’s better to claim incompetence rather than malice because then opinion of you among the survivors won’t suffer as much.
I think we learned that when you tell people to not destroy the world they try to not destroy the world. How is [press this button and the world ends → don’t press button] different from [press this button or else the world ends → press button]?
The asymmetry is the button itself. If I understand correctly, Chris got this message on a separate channel, and the button still looked the same; it still said “enter launch codes to destroy LessWrong”. It was still clearly meant to represent the launch of nukes.
Stretching just a bit, I think you might be able to draw an analogy here, where real people who might actually launch nuclear weapons (or have done so in other branches of the multiverse) have thought they had reasons important enough to justify doing it. But in fact, the rule is not “don’t launch nukes unless there seems to be sufficient reason for it”, but rather “don’t launch nukes”.
Good point. I didn’t see the button setup before it went down, and I was thinking the OP did not receive the main email and just got the “special instructions” they posted. This does make it more analogous to a “false alarm” situation.
I think this is the wrong lesson. If the Enemy knows you have precommitted to never press the button, then they are not deterred from striking first. MAD is game theory. In order to not blow up the world, you have to be willing to blow up the world. It’s a Newcomblike problem: It feels like there are two decisions to be made, but there is only one, in advance.
But we are not in a game theory situation. We are in an imperfect world with imperfect information. There are malfunctioning warning systems and liars. And we are humans and not programs that get to read each others source code. There are no perfect commitments and if there where, there would be no way of verifying them.
So I think that the lesson is, that what ever your public stance, and whether or not you think that there are counterfactual situation where you should nuke. In practice, you should not nuke.
Game theory was pioneered by Schelling with the central and most important application being handling nuclear armed conflicts. To say that game theory doesn’t apply to nuclear conflict because we live in an imperfect world is just not accurate. Game theory doesn’t require a perfect world nor does it require that actors know each other’s source code. It is designed to guide decisions made in the real world.
I know that it is designed to guide decisions made in the real world. This does not force me to agree with the conclusions in all circumstances. Lots of models are not up to the task they are designed to deal with.
But I should have said “not in that game theory situation”, becasue there is probably a way to construct some game theory game that applies here. That was my bad.
However, I stand by the claim that the full information game is too far from reality to be a good guide in this case. With stakes this high even small uncertainty becomes important.
Game theory is very much applicable to the real world. Imperfect information is just a different game. You are correct that assuming perfect information is a simplification. But assuming imperfect information, what does that change?
You want to lie to the Enemy, convince them that you will always push the button if they cross the line, then never actually do it, and the Enemy knows this!
Sometimes all available options are risky. Betting your life on a coin flip is not generally a good idea, but if the only alternative is a lottery ticket, the coin flip looks pretty good. If the Enemy knows there’s a significant chance that you won’t press the button, in a sufficiently desperate situation, the Enemy might bet on that and strike first. But if the Enemy knows self-destruction is assured, then striking first looks like a bad option.
What possible reason could Petrov or those in similar situations have had for not pushing the button? Maybe he believed that the US would retaliate and kill his family at home, and that deterred him. In other words, he believed his enemy would push the button.
Applied to the real world, game theory is not just about how to play the games. It’s also about the effects of changing the rules.
What possible reason could Petrov or those in similar situations have had for not pushing the button? Maybe he believed that the US would retaliate and kill his family at home, and that deterred him. In other words, he believed his enemy would push the button.
Or maybe he just did not want to kill millions of people?
In Petrov’s case in particular, the new satellite-based early warning system was unproven so he didn’t completely trust it, and he didn’t believe a US first strike would use only one missile, or later, only four more, instead of hundreds. Furthermore, ground radar didn’t confirm. And, of course, attacking on a false alarm would be suicidal because he believed the Enemy would push the button, so striking first “just in case”, failed his cost-benefit analysis.
I should probably have said “we are not in that game theory situation”. (Though I do think that the real world is more complex that current game theory can handle. E.g. I don’t think current game theory can fully handle unknown-unknown, but I could be wrong on this point)
The game of mutually assured destruction is very different even when just including known unknown.
In other words, if your defense is “just following orders”, you’re in the wrong. Petrov, too, was strongly influenced to launch the nukes, and still refused… Like that Soviet submarine commander who, during the Cuban Missile Crisis, thought he was engaged with live depth charges by the US Navy.
To not have buttons that can be pressed to destroy the world, since the possibility existing for many agents is the real issue, because circumstances could deliver compelling reasons to press it and the more buttons exist the more likely is it that it will happen .
Reality won’t deliver the same circumstances twice. If the petrovdayadmin wanted to go for symmetry the message would have said: someone of the 270 pressed the red button, if you want to keep the home page online 5 people have to press it within etc, etc...
Well, they did succeed, so for that they get points, but I think it was more due to a very weak defense on behalf of the victim rather than a very strong effort by petrov_day_admin_account.
Like, the victim could have noticed things like: * The original instructions were sent over email + LessWrong message, but the phishing attempt was just LessWrong * The original message was sent by Ben Pace, the latter by petrov_day_admin_account * They were sent at different points in time, the latter of which was more correlated by the FB post that caused the phishing attempt
Moreover, the attacker even sent messages to two real LessWrong team members, which would have completely revealed the attempt had those admins not been asleep in a different time zone.
I personally feel that the fact that it was such an effortless attempt makes it more impressive, and really hammers home the lesson we need to take away from this. It’s one thing to put in a great deal of effort to defeat some defences. It’s another to completely smash through them with the flick of a wrist.
I haven’t actually figured that out yet, but several people in this thread have proposed takeaways. I’m leaning towards “social engineering is unreasonably effective”. That or something related to keeping a security mindset.
I am reminded by the ai unboxing challengs where part of the point was that any single trick that gets the job done can be guarded against but guarding against all stupid tricks is not about the tricks being particularly brilliant but just covering them all.
In millgrams experiment poeple are wiling to torture becuase a guy in a white jacket requested so. Here a person is ready to nuke the world because a accounts name incuded the word “admin”.
That’s… a bit surprising. If I were behind this, I wouldn’t have sent a message to you because you’re likely to know the plan. Anyone who receives the message but doesn’t fall for it is an extra chance for the scheme to fail, because if nothing else they can post in this thread where someone is most likely to see it before entering codes. (Chris, I’m curious if you did look in this thread before you put them in?) In your case you could react with admin powers too, though I dunno if you would have considered that fair game.
I feel like this gives us a small amount of evidence about the identity of the adversary, but not enough to do any real speculation with.
EDIT: I now believe the below contains substantial errors, after reading this message from the attacker.
Maybe you want to do sleuthing on your own, if so don’t read below. (It uses LessWrong’s spoiler feature.)
I believe the adversary was a person outside of the EA and rationality communities. They had not planned this, and they did not think very hard about who they sent the messages to (and didn’t realise Habryka and Raemon were admins). Rather, they saw a spur-of-the-moment opportunity to attack this system after seeing a Facebook post by Chris Leong (which solicited reasons for and against pressing the button). I believe this because they commented on that Chris Leong posted and say they sent the message.
I looked at the thread and considered commenting here, but not many people had commented, so I figured there wasn’t that much chance of getting a response if I posted here.
In your other post, the only reason you indicated to not press the button is that other people would still be asleep and not have experienced the thing.
As such, it feels as if the “trick” by your friend just sped up what would have almost certainly happened anyway: you eventually pressing the button and nuking the site. It’d just have happened later in the day.
That was a poorly written post on my part. What I meant was that I was open to argument either way (“Should I press it or not?). I had decided that regardless of whether I pressed it or not I would at least wait until other people had a chance to wake up, as I thought it’d be boring for people if they woke up and the site was already nuked. So it wasn’t my only reason—I hadn’t even really thought about it too much as I was waiting for more comments.
Even though it was poorly written, I’m surprised how many people seem to have misunderstood it as I would have thought it was clear enough as I asked the question.
This seems plausible. I do want to note that your received message was timestamped 11:26 (local to you) and the button was pressed at 11:33:30 (The received message said the time limit was 30 minutes.), which doesn’t seems like an abundance of caution and hesitation to blow up the frontpage, as far as I can tell. :P
I know it wasn’t actual nukes, so fair to not put in the same effort, but I do hope if you ever do have nukes, you take full allotted time to think though it and discuss with anyone available (even if you think they’re unlikely to reply). ;)
Well, it was just a game and I had other things to do. Plus I didn’t feel a duty to take it 100% seriously since, as grateful as I was to have the chance to participate, I didn’t actually choose to play.
(Plus, adding on to this comment. I honestly had no idea people took this whole thing so seriously. Just seemed like a bit of fun to me!)
To be clear, while there is obviously some fun intended in this tradition, I don’t think describing it as “just a game” feels appropriate to me. I do actually really care about people being able to coordinate to not take the site down. It’s an actual hard thing to do that actually is trying to reinforce a bunch of the real and important values that I care about in Petrov day. Of course, I can’t force you to feel a certain way, but like, I do sure feel a pretty high level of disappointment reading this response.
Like, the email literally said you were chosen to participate because we trusted you to not actually use the codes.
So, I think it’s important that LessWrong admins do not get to unilaterally decide that You Are Now Playing a Game With Your Reputation.
However, if Chris doesn’t want to play, the action available to him is simply to not engage. I don’t think he gets to both press the button and change the rules to decide what a button press means to other players.
So, I think it’s important that LessWrong admins do not get to unilaterally decide that You Are Now Playing a Game With Your Reputation.
Dude, we’re all always playing games with our reputations. That’s, like, what reputation is.
And good for Habyka for saying he feels disappointment at the lack of thoughtfulness and reflection, it’s very much not just permitted but almost mandated by the founder of this place —
Here’s the relevant citation from Well-Kept Gardens:
I confess, for a while I didn’t even understand why communities had such trouble defending themselves—I thought it was pure naivete. It didn’t occur to me that it was an egalitarian instinct to prevent chieftains from getting too much power.
This too:
I have seen rationalist communities die because they trusted their moderators too little.
Let’s give Habryka a little more respect, eh? Disappointment is a perfectly valid thing to be experiencing and he’s certainly conveying it quite mildly and graciously. Admins here did a hell of a job resurrecting this place back from the dead, to express very mild disapproval at a lack of thoughtfulness during a community event is....… well that seems very much on-mission, at least according to Yudkowsky.
I feel confused about how you interpreted my comment, and edited it lightly. For the record, Habryka’s comment seems basically right to me; just wanted to add some nuance.
Honestly, I kind of think that would be a straightforwardly silly thing to worry about, if one were to think about it for a few moments. (And I note that it’s not Chris’ stated reasoning.)
Like, leave aside that the PM was indistinguishable from a phishing attack. Pretend that it had come through both email and PM, from Ben Pace, with the codes repeated. All the same… LW just isn’t the kind of place where we’re going to socially shame someone for
Not taking action
...within 30 minutes of an unexpected email being sent to them
Y’know, there was a post I thought about writing up, but then I was going to not bother to write it up, but I saw your comment here H and “high level of disappointment reading this response”… and so I wrote it up.
The downvotes on this comment seem ridiculous to me. If I email 270 people to tell them I’ve carefully selected them for some process, I cannot seriously presume they will give up >0 of their time to take part in it.
Any such sacrifice they make is a bonus, so if they do give up >0 time, it’s absurd to ask that they give up even more time to research the issue.
Any negative consequences are on the person who set up the game. Adding the justification that ‘I trust you’ does not suddenly make the recipient more obligated to the spammer.
It’s not like we asked 270 random people. We asked 270 people, each one of which had already invested many hundreds of hours into participating on LessWrong, many of which I knew personally and considered close friends. Like, I agree, if you message 270 random people you don’t get to expect anything from them, but the whole point of networks of trust is that you get to expect things from each other and ask things from each other.
If any of the people in that list of 270 people had asked me to spend a few minutes doing something that was important to them, I would have gladly obliged.
It doesn’t matter whether you’d have been hypothetically willing to do something for them. As I said on the Facebook thread, you did not consult with them. You merely informed them they were in a game, which, given the social criticism Chris has received, had real world consequences if they misplayed. In other words, you put them in harm’s way without their consent. That is not a good way to build trust.
Just a datapoint on variety of invitees: I was included in the 270, and I’ve invested hundreds of hours into LW. while I don’t know you personally outside the site, I hope you consider me a trusted acquaintance, if not a friend. I had no clue this was anything but a funny little game, and my expectation was that there would be dozens of button presses before I even saw the mail.
I had not read nor paid attention to the petrov day posts (including prior years). I had no prior information about the expectations of behavior, the weight put on the outcome, nor the intended lesson/demonstration of … something that’s being interpreted as “coordination” or “trust”.
I wasn’t using the mental model that indicated I was being trusted not to do something—I took it as a game to see who’d get there first, or how many would press the button, not a hope that everyone would solemnly avoid playing (by passively ignoring the mail). I think without a ritual for joining the group (opt-in), it’s hard to judge anyone or learn much about the community from the actions that occurred.
I had no clue this was anything but a funny little game, and my expectation was that there would be dozens of button presses before I even saw the mail.
And this is pretty surprising to me. Like, we ran this game last year with half of the number of people, without anyone pressing the button. We didn’t really change much about the framing, so where does this expectation come from? My current model is indeed that the shared context between the ~125 people from last year is quite a bit smaller than it was this year with ~250 people.
I don’t think that there was no change in framing. Last year:
Every Petrov Day, we practice not destroying the world. One particular way to do this is to practice the virtue of not taking unilateralist action.
It’s difficult to know who can be trusted, but today I have selected a group of LessWrong users who I think I can rely on in this way. You’ve all been given the opportunity to show yourselves capable and trustworthy.
This Petrov Day, between midnight and midnight PST, if you, ChristianKl, enter the launch codes below on LessWrong, the Frontpage will go down for 24 hours.
Personalised launch code: …
I hope to see you on the other side of this, with our honor intact.
Yours, Ben Pace & the LessWrong 2.0 Team
This year:
On Petrov Day, we celebrate and practice not destroying the world.
It’s difficult to know who can be trusted, but today I have selected a group of 270 LessWrong users who I think I can rely on in this way. You’ve all been given the opportunity to not destroy LessWrong.
This Petrov Day, if you, ChristianKl, enter the launch codes below on LessWrong, the Frontpage will go down for 24 hours, removing a resource thousands of people view every day. Each entrusted user has personalised launch codes, so that it will be clear who nuked the site.
Your personalised codes are: …
I hope to see you in the dawn of tomorrow, with our honor still intact.
–Ben Pace & the LessWrong Team
The last year was more explict about both the goal of the exercise and what it means for an individual to not use the code.
Using the phrase destroy LessWrong this year was a tell that this isn’t a serious exercise because people ususally don’t exaggerate when they are serious. Especially rationalists can usually be trusted to use clear words when they are serious.
Reading the message this time, I had the impression that it would be more likely for the website to go down then last year.
I hadn’t paid attention to the topic, and did not know it had run last year with that result (or at least hadn’t thought about it enough to update on) so that expectation was my prior.
Now that I’ve caught up on things, I realize I am confused. I suspect it was a fluke or some unanalyzed difference in setup that caused the success last year, but that explanation seems a bit facile, so I’m not sure how to actually update. I’d predict that running it again would result in the button being pressed, but I wouldn’t wager very much (in either direction).
Awww. I can’t decide whether to be annoyed with petrov_day_admin_account , or to appreciate their object lesson in the importance of pre-commitment and robust epistemologies (I’m leaning towards both!)
If this was also done by the site admins (rather than being a deliberate attempt at sabotage), it seems a bit xkcd-169-y to me.
If it was done by the admins: If someone receiving that message had replied to say something like “the button still says ‘launch the nukes’—please clarify”, what would they have been told?
If Chris could be confident it came from the admins I’d agree, but with my current knowledge (and assuming the admins would have been honest had Chris messaged them on their normal accounts) it feels more like pentesting.
My company “evaluates” phishing propensity by sending employees emails directing them to “honeypots” which are in the corporate domain and signed by the corporate ssl certificates. Unsurprisingly, many employees trust ssl and enter their credentials. My takeaway was not that people are bad at security, but that they will tend to trust the system if the stakes don’t appear too high.
My partner says that as a kid, their school did something similar as part of “don’t talk to strangers” teaching. The “stranger” in question was someone the class been working with all day, introduced by their teacher.
So the LW team, who created the experiment, themselves tricked you by sending this message? And the way to win the game was for you to read on LessWrong the warning “enter launch codes to destroy LessWrong” and decide against it, despite the message? The message was a metaphor for a false alarm?
Sorry, I got tricked:
petrov_day_admin_account September 26, 2020 11:26 AM Hello Chris_Leong,
You are part of a smaller group of 30 users who has been selected for the second part of this experiment. In order for the website not to go down, at least 5 of these selected users must enter their codes within 30 minutes of receiving this message, and at least 20 of these users must enter their codes within 6 hours of receiving the message. To keep the site up, please enter your codes as soon as possible. You will be asked to complete a short survey afterwards.
From this we learn that you should not launch nukes, even if someone tells you to do it.
I think the lesson is that if you decide to launch the nukes it’s better to claim incompetence rather than malice because then opinion of you among the survivors won’t suffer as much.
I think we learned that when you tell people to not destroy the world they try to not destroy the world. How is [press this button and the world ends → don’t press button] different from [press this button or else the world ends → press button]?
I think we learned that trolls will destroy the world.
The asymmetry is the button itself. If I understand correctly, Chris got this message on a separate channel, and the button still looked the same; it still said “enter launch codes to destroy LessWrong”. It was still clearly meant to represent the launch of nukes.
Stretching just a bit, I think you might be able to draw an analogy here, where real people who might actually launch nuclear weapons (or have done so in other branches of the multiverse) have thought they had reasons important enough to justify doing it. But in fact, the rule is not “don’t launch nukes unless there seems to be sufficient reason for it”, but rather “don’t launch nukes”.
Good point. I didn’t see the button setup before it went down, and I was thinking the OP did not receive the main email and just got the “special instructions” they posted. This does make it more analogous to a “false alarm” situation.
I recieved both messages
I think this is the wrong lesson. If the Enemy knows you have precommitted to never press the button, then they are not deterred from striking first. MAD is game theory. In order to not blow up the world, you have to be willing to blow up the world. It’s a Newcomblike problem: It feels like there are two decisions to be made, but there is only one, in advance.
But we are not in a game theory situation. We are in an imperfect world with imperfect information. There are malfunctioning warning systems and liars. And we are humans and not programs that get to read each others source code. There are no perfect commitments and if there where, there would be no way of verifying them.
So I think that the lesson is, that what ever your public stance, and whether or not you think that there are counterfactual situation where you should nuke. In practice, you should not nuke.
Do you see what I’m getting at?
Game theory was pioneered by Schelling with the central and most important application being handling nuclear armed conflicts. To say that game theory doesn’t apply to nuclear conflict because we live in an imperfect world is just not accurate. Game theory doesn’t require a perfect world nor does it require that actors know each other’s source code. It is designed to guide decisions made in the real world.
I know that it is designed to guide decisions made in the real world. This does not force me to agree with the conclusions in all circumstances. Lots of models are not up to the task they are designed to deal with.
But I should have said “not in that game theory situation”, becasue there is probably a way to construct some game theory game that applies here. That was my bad.
However, I stand by the claim that the full information game is too far from reality to be a good guide in this case. With stakes this high even small uncertainty becomes important.
Game theory is very much applicable to the real world. Imperfect information is just a different game. You are correct that assuming perfect information is a simplification. But assuming imperfect information, what does that change?
You want to lie to the Enemy, convince them that you will always push the button if they cross the line, then never actually do it, and the Enemy knows this!
Sometimes all available options are risky. Betting your life on a coin flip is not generally a good idea, but if the only alternative is a lottery ticket, the coin flip looks pretty good. If the Enemy knows there’s a significant chance that you won’t press the button, in a sufficiently desperate situation, the Enemy might bet on that and strike first. But if the Enemy knows self-destruction is assured, then striking first looks like a bad option.
What possible reason could Petrov or those in similar situations have had for not pushing the button? Maybe he believed that the US would retaliate and kill his family at home, and that deterred him. In other words, he believed his enemy would push the button.
Applied to the real world, game theory is not just about how to play the games. It’s also about the effects of changing the rules.
Or maybe he just did not want to kill millions of people?
In Petrov’s case in particular, the new satellite-based early warning system was unproven so he didn’t completely trust it, and he didn’t believe a US first strike would use only one missile, or later, only four more, instead of hundreds. Furthermore, ground radar didn’t confirm. And, of course, attacking on a false alarm would be suicidal because he believed the Enemy would push the button, so striking first “just in case”, failed his cost-benefit analysis.
It was not “just” a commitment to pacifism.
I should probably have said “we are not in that game theory situation”.
(Though I do think that the real world is more complex that current game theory can handle. E.g. I don’t think current game theory can fully handle unknown-unknown, but I could be wrong on this point)
The game of mutually assured destruction is very different even when just including known unknown.
In other words, if your defense is “just following orders”, you’re in the wrong. Petrov, too, was strongly influenced to launch the nukes, and still refused… Like that Soviet submarine commander who, during the Cuban Missile Crisis, thought he was engaged with live depth charges by the US Navy.
I think the lessons are:
To not have buttons that can be pressed to destroy the world, since the possibility existing for many agents is the real issue, because circumstances could deliver compelling reasons to press it and the more buttons exist the more likely is it that it will happen .
Reality won’t deliver the same circumstances twice. If the petrovdayadmin wanted to go for symmetry the message would have said: someone of the 270 pressed the red button, if you want to keep the home page online 5 people have to press it within etc, etc...
Props to whoever petrov_day_admin_account was for successfully red-teaming lesswrong.
Agreed, this is probably the best lesson of all. If the buttons exist, they can be hacked or the decision makers can be socially engineered.
270 people might have direct access, but the entire world has indirect access.
Well, they did succeed, so for that they get points, but I think it was more due to a very weak defense on behalf of the victim rather than a very strong effort by petrov_day_admin_account.
Like, the victim could have noticed things like:
* The original instructions were sent over email + LessWrong message, but the phishing attempt was just LessWrong
* The original message was sent by Ben Pace, the latter by petrov_day_admin_account
* They were sent at different points in time, the latter of which was more correlated by the FB post that caused the phishing attempt
Moreover, the attacker even sent messages to two real LessWrong team members, which would have completely revealed the attempt had those admins not been asleep in a different time zone.
I personally feel that the fact that it was such an effortless attempt makes it more impressive, and really hammers home the lesson we need to take away from this. It’s one thing to put in a great deal of effort to defeat some defences. It’s another to completely smash through them with the flick of a wrist.
What exactly do you think “the lesson we need to take away from this” is?
(Feel free to just link if you wrote that elsewhere in this comment section)
I haven’t actually figured that out yet, but several people in this thread have proposed takeaways. I’m leaning towards “social engineering is unreasonably effective”. That or something related to keeping a security mindset.
I am reminded by the ai unboxing challengs where part of the point was that any single trick that gets the job done can be guarded against but guarding against all stupid tricks is not about the tricks being particularly brilliant but just covering them all.
In millgrams experiment poeple are wiling to torture becuase a guy in a white jacket requested so. Here a person is ready to nuke the world because a accounts name incuded the word “admin”.
FYI I also got this message and to the best of my knowledge it was not sent by an admin
I can confirm that this message was not sent by any admin.
That’s… a bit surprising. If I were behind this, I wouldn’t have sent a message to you because you’re likely to know the plan. Anyone who receives the message but doesn’t fall for it is an extra chance for the scheme to fail, because if nothing else they can post in this thread where someone is most likely to see it before entering codes. (Chris, I’m curious if you did look in this thread before you put them in?) In your case you could react with admin powers too, though I dunno if you would have considered that fair game.
I feel like this gives us a small amount of evidence about the identity of the adversary, but not enough to do any real speculation with.
EDIT: I now believe the below contains substantial errors, after reading this message from the attacker.
Maybe you want to do sleuthing on your own, if so don’t read below. (It uses LessWrong’s spoiler feature.)
I believe the adversary was a person outside of the EA and rationality communities. They had not planned this, and they did not think very hard about who they sent the messages to (and didn’t realise Habryka and Raemon were admins). Rather, they saw a spur-of-the-moment opportunity to attack this system after seeing a Facebook post by Chris Leong (which solicited reasons for and against pressing the button). I believe this because they commented on that Chris Leong posted and say they sent the message.
I looked at the thread and considered commenting here, but not many people had commented, so I figured there wasn’t that much chance of getting a response if I posted here.
In your other post, the only reason you indicated to not press the button is that other people would still be asleep and not have experienced the thing.
As such, it feels as if the “trick” by your friend just sped up what would have almost certainly happened anyway: you eventually pressing the button and nuking the site. It’d just have happened later in the day.
That was a poorly written post on my part. What I meant was that I was open to argument either way (“Should I press it or not?). I had decided that regardless of whether I pressed it or not I would at least wait until other people had a chance to wake up, as I thought it’d be boring for people if they woke up and the site was already nuked. So it wasn’t my only reason—I hadn’t even really thought about it too much as I was waiting for more comments.
Even though it was poorly written, I’m surprised how many people seem to have misunderstood it as I would have thought it was clear enough as I asked the question.
This seems plausible. I do want to note that your received message was timestamped 11:26 (local to you) and the button was pressed at 11:33:30 (The received message said the time limit was 30 minutes.), which doesn’t seems like an abundance of caution and hesitation to blow up the frontpage, as far as I can tell. :P
I know it wasn’t actual nukes, so fair to not put in the same effort, but I do hope if you ever do have nukes, you take full allotted time to think though it and discuss with anyone available (even if you think they’re unlikely to reply). ;)
Well, it was just a game and I had other things to do. Plus I didn’t feel a duty to take it 100% seriously since, as grateful as I was to have the chance to participate, I didn’t actually choose to play.
(Plus, adding on to this comment. I honestly had no idea people took this whole thing so seriously. Just seemed like a bit of fun to me!)
To be clear, while there is obviously some fun intended in this tradition, I don’t think describing it as “just a game” feels appropriate to me. I do actually really care about people being able to coordinate to not take the site down. It’s an actual hard thing to do that actually is trying to reinforce a bunch of the real and important values that I care about in Petrov day. Of course, I can’t force you to feel a certain way, but like, I do sure feel a pretty high level of disappointment reading this response.
Like, the email literally said you were chosen to participate because we trusted you to not actually use the codes.
So, I think it’s important that LessWrong admins do not get to unilaterally decide that You Are Now Playing a Game With Your Reputation.
However, if Chris doesn’t want to play, the action available to him is simply to not engage. I don’t think he gets to both press the button and change the rules to decide what a button press means to other players.
Dude, we’re all always playing games with our reputations. That’s, like, what reputation is.
And good for Habyka for saying he feels disappointment at the lack of thoughtfulness and reflection, it’s very much not just permitted but almost mandated by the founder of this place —
https://www.lesswrong.com/posts/tscc3e5eujrsEeFN4/well-kept-gardens-die-by-pacifism
https://www.lesswrong.com/posts/RcZCwxFiZzE6X7nsv/what-do-we-mean-by-rationality-1
Here’s the relevant citation from Well-Kept Gardens:
This too:
Let’s give Habryka a little more respect, eh? Disappointment is a perfectly valid thing to be experiencing and he’s certainly conveying it quite mildly and graciously. Admins here did a hell of a job resurrecting this place back from the dead, to express very mild disapproval at a lack of thoughtfulness during a community event is....… well that seems very much on-mission, at least according to Yudkowsky.
I feel confused about how you interpreted my comment, and edited it lightly. For the record, Habryka’s comment seems basically right to me; just wanted to add some nuance.
Ah, I see, I read the original version partially wrong, my mistake. We’re in agreement. Regards.
Well, I had an option not to engage until I received the message saying it would blow up if enough users didn’t press the button within half an hour.
Even after receiving that message, it still seems like the “do not engage” action is to not enter the codes?
I think “doesn’t want to ruin other people’s fun or do anything significant” feels more accurate than “do not engage” here?
And then, for all he knew, his name might have been posted in a list of users who could have prevented the apocalypse but didn’t.
Honestly, I kind of think that would be a straightforwardly silly thing to worry about, if one were to think about it for a few moments. (And I note that it’s not Chris’ stated reasoning.)
Like, leave aside that the PM was indistinguishable from a phishing attack. Pretend that it had come through both email and PM, from Ben Pace, with the codes repeated. All the same… LW just isn’t the kind of place where we’re going to socially shame someone for
Not taking action
...within 30 minutes of an unexpected email being sent to them
...whether or not they even saw the email
...in a game they didn’t agree to play.
And then maybe the site would have blown up, which was not what I was aiming for at that time.
Y’know, there was a post I thought about writing up, but then I was going to not bother to write it up, but I saw your comment here H and “high level of disappointment reading this response”… and so I wrote it up.
Here you go:
https://www.lesswrong.com/posts/scL68JtnSr3iakuc6/win-first-vs-chill-first
That’s an extreme-ish example, but I think the general principle holds to some extent in many places.
I’ve responded to you in the last section of this post.
The downvotes on this comment seem ridiculous to me. If I email 270 people to tell them I’ve carefully selected them for some process, I cannot seriously presume they will give up >0 of their time to take part in it.
Any such sacrifice they make is a bonus, so if they do give up >0 time, it’s absurd to ask that they give up even more time to research the issue.
Any negative consequences are on the person who set up the game. Adding the justification that ‘I trust you’ does not suddenly make the recipient more obligated to the spammer.
It’s not like we asked 270 random people. We asked 270 people, each one of which had already invested many hundreds of hours into participating on LessWrong, many of which I knew personally and considered close friends. Like, I agree, if you message 270 random people you don’t get to expect anything from them, but the whole point of networks of trust is that you get to expect things from each other and ask things from each other.
If any of the people in that list of 270 people had asked me to spend a few minutes doing something that was important to them, I would have gladly obliged.
It doesn’t matter whether you’d have been hypothetically willing to do something for them. As I said on the Facebook thread, you did not consult with them. You merely informed them they were in a game, which, given the social criticism Chris has received, had real world consequences if they misplayed. In other words, you put them in harm’s way without their consent. That is not a good way to build trust.
Just a datapoint on variety of invitees: I was included in the 270, and I’ve invested hundreds of hours into LW. while I don’t know you personally outside the site, I hope you consider me a trusted acquaintance, if not a friend. I had no clue this was anything but a funny little game, and my expectation was that there would be dozens of button presses before I even saw the mail.
I had not read nor paid attention to the petrov day posts (including prior years). I had no prior information about the expectations of behavior, the weight put on the outcome, nor the intended lesson/demonstration of … something that’s being interpreted as “coordination” or “trust”.
I wasn’t using the mental model that indicated I was being trusted not to do something—I took it as a game to see who’d get there first, or how many would press the button, not a hope that everyone would solemnly avoid playing (by passively ignoring the mail). I think without a ritual for joining the group (opt-in), it’s hard to judge anyone or learn much about the community from the actions that occurred.
And this is pretty surprising to me. Like, we ran this game last year with half of the number of people, without anyone pressing the button. We didn’t really change much about the framing, so where does this expectation come from? My current model is indeed that the shared context between the ~125 people from last year is quite a bit smaller than it was this year with ~250 people.
I don’t think that there was no change in framing. Last year:
This year:
The last year was more explict about both the goal of the exercise and what it means for an individual to not use the code.
Using the phrase destroy LessWrong this year was a tell that this isn’t a serious exercise because people ususally don’t exaggerate when they are serious. Especially rationalists can usually be trusted to use clear words when they are serious.
Reading the message this time, I had the impression that it would be more likely for the website to go down then last year.
I hadn’t paid attention to the topic, and did not know it had run last year with that result (or at least hadn’t thought about it enough to update on) so that expectation was my prior.
Now that I’ve caught up on things, I realize I am confused. I suspect it was a fluke or some unanalyzed difference in setup that caused the success last year, but that explanation seems a bit facile, so I’m not sure how to actually update. I’d predict that running it again would result in the button being pressed, but I wouldn’t wager very much (in either direction).
Awww. I can’t decide whether to be annoyed with petrov_day_admin_account , or to appreciate their object lesson in the importance of pre-commitment and robust epistemologies (I’m leaning towards both!)
It seems like the lessons are more about credulity and basic opsec?
If this was also done by the site admins (rather than being a deliberate attempt at sabotage), it seems a bit xkcd-169-y to me.
If it was done by the admins: If someone receiving that message had replied to say something like “the button still says ‘launch the nukes’—please clarify”, what would they have been told?
If Chris could be confident it came from the admins I’d agree, but with my current knowledge (and assuming the admins would have been honest had Chris messaged them on their normal accounts) it feels more like pentesting.
My company “evaluates” phishing propensity by sending employees emails directing them to “honeypots” which are in the corporate domain and signed by the corporate ssl certificates. Unsurprisingly, many employees trust ssl and enter their credentials. My takeaway was not that people are bad at security, but that they will tend to trust the system if the stakes don’t appear too high.
My partner says that as a kid, their school did something similar as part of “don’t talk to strangers” teaching. The “stranger” in question was someone the class been working with all day, introduced by their teacher.
I also think that XKCD would be quite appropriate had it been the site admins. But no, it was not us.
Aw, consoling hugs!
I’d like to offer some combination of consoling hugs and ಠ_ಠ.
(edit: but to be clear I’m not super-confident I wouldn’t have fallen for it myself. Especially if I saw that message 25 minutes after it was sent.)
Super lame compared to last year when people were willing to pay thousands of dollars to the code holders or the charity of their choice.
4chan continues inheriting the world.
So the LW team, who created the experiment, themselves tricked you by sending this message? And the way to win the game was for you to read on LessWrong the warning “enter launch codes to destroy LessWrong” and decide against it, despite the message? The message was a metaphor for a false alarm?
Well, here’s my metaphor for a moon-base back up: https://www.greaterwrong.com/ :)
I’ll set a reminder to set-up additional safety measures for next year.
EtA: Ah, Vanessa Kosoy already pointed this out
EtA: Ah, Raymond said it wasn’t them that sent this email
Yeah, it wasn’t the LW team, but one of my friends
who?
https://www.facebook.com/casebash/posts/10100495197743741?comment_id=10100495207174841