So in some ways this has been good for making the challenge level clear – this is a real coordination problem, that we can fail, and isn’t overdetermined in either direction.
I think every single part of this exercise, and all of the responses to it, have been extremely informative.
The priors of hundreds of thoughtful people have been updated substantially.
Others learned that they cared about something (either the site or the Petrov Day button) more than they thought they did, and, being thoughtful, learned something about themselves by thinking about why they cared so much.
At least one person (chris) received an object lesson in security that I doubt ze will forget.
The rest of us had the opportunity to learn that lesson as well, although lessons not learned in blood are soon forgotten, and for a given individual, “losing the front page for one day” is far less blood than what it cost chris.
I updated my priors about myself. If you’d asked me in 2019, after the successful no-press, whether I would have pressed it with codes, I probably would have said no. However, seeing the amount of thoughtfulness generated by what went down in 2020, if I get codes in 2021, I will give serious thought to pressing the button because doing so:
will, no matter the reason, instill even further caution and security-paranoia in AI researchers, which is stressful for them but beneficial for humanity.
causes the community to learn something about itself which can’t be predicted in advance but is almost certainly accurate.
shows that if it’s something I would consider, it’s something certain types of AI would consider, too, if we’re not careful. AI risk researchers are aware of that problem, but pressing the button gives one the opportunity to make them feel it in their guts.
I wonder if there is instrumental value in pre-committing to being a Petrov Day ruiner unless certain types of conditions are met. “I am coming to the San Francisco Petrov Day 2022 celebration and will press the button five minutes before the end of the ceremony unless I first come to understand how we can solve the problem of whether an autonomous vehicle should swerve, and thereby kill the 18-year-old pregnant valedictorian, or not swerve, and thereby kill the 84-year-old semi-retired Medicin Sans Frontieres who hand-carves toys for kids at the orphanage.”
However, seeing the amount of thoughtfulness generated by what went down in 2020, if I get codes in 2021, I will give serious thought to pressing the button because doing so will, no matter the reason, instill even further caution and security-paranoia in AI researchers, which is stressful for them but beneficial for humanity.
The reason that it induces stress to see the website go down is that it tells me something about the external world. Please do not hack my means of understanding the external world to induce mental states in me that you assume will be beneficial to humanity.
Very much this. Indeed, creating common knowledge that the people around me will be able to wield potentially destructive power without trying to leverage that power into taking resources away from me, or trying to force me to change my mind on something, is one of the things I want out of Petrov Day, since it’s definitely a thing I am worried about.
I understand and sympathize with the desire to know that people around you can hold that power without abusing it. I would also like to know that.
But it’s only ever a belief about the average behavior of the people in the community. It should update when new information becomes available. The button is a test of your belief. Each decision made by each person to press or not to press the button is information that should feed your model of how probable it is that people can hold the power without abusing it. A bunch of people staring at a red button with candles lit nearby can be an iterated Prisoner’s Dilemma depending on the participants’ utility functions.
If I decide not to Petrov-Ruin out of a desire to protect your belief that people can hold the power without abusing it, and I make that change because I care about you and your suffering as a fellow human and think your life will be much worse if my actions demolish that belief, then a successful Petrov Day is at risk of becoming another example of Goodhart’s Law.
I think, anyway. Sometimes my prose comes off as aggressive when I’m just trying to engage with thoughtful people. I swear, on SlateStarCodex’s review of Surfing Uncertainty, that I’m typing in good faith and could have my mind changed on these issues.
If I decide not to Petrov-Ruin out of a desire to protect your belief that people can hold the power without abusing it, and I make that change because I care about you and your suffering as a fellow human and think your life will be much worse if my actions demolish that belief, then a successful Petrov Day is at risk of becoming another example of Goodhart’s Law.
TBC, I think you’re supposed to not Petrov-ruin so as to not be destructive (or to leverage your destructive power to modify habryka to be more like you’d like them to be). My interpretation of habryka is that it would be nice if (a) it were actually true that this community could wield destructive power without being destructive etc and (b) everybody knew that. The problem with wielding destructive power is that it makes (a) false, not just that it makes (b) false.
It sounds like we draw the line differently on the continuum between persuasion and brain hacking. I’d like to hear more about why you think some or all parts of this are is hacking so I can calibrate my “I’m probably not a sociopath” prior.
Or maybe we are diverging on what things one can legitimately claim are the purposes of Petrov Day. If the purpose of an in-person celebration is “Provide a space for community members to get together, break bread, and contemplate, and the button’s raises the tension high enough that it feels like something is at stake but there’s no serious risk that the button will be pressed,” then I’m wrong, I’m being cruel, and some other forum is more appropriate for my Ruiner to raise zir concern. But I don’t get the sense that there’s a community consensus on this.
On the contrary, the fact that all attendees are supposed to leave in silence if the button is pressed suggests that the risk is meant to be real and to have consequences.
I’d like to hear more about why you think some or all parts of this are hacking so I can calibrate my “I’m probably not a sociopath” prior.
I don’t think that you doing this would be “brain hacking”. But your plan to press the button in order to make me be more cautious and paranoid works roughly like this: you would decide to press the button, so as to cause me to believe that the world is full of untrustworthy people, so that I make different decisions. Here’s my attempt to break down my complaint about this:
You are trying to manipulate my beliefs to change my actions, without checking if this will make my beliefs more accurate.
Your manipulation of my beliefs will cause them to be inaccurate: I will probably believe that the world is contains careless people, or people with a reckless disregard for the website staying up. But actually what’s going on is the world contains people who want me to be paranoid.
To the extent that I do figure out what’s going on and do have true beliefs, then you’re just choosing whether I can have accurate beliefs in the world where things are bad, vs having accurate beliefs in the world where things are good. But it’s better for things to be good than bad.
If I have wrong beliefs about the distribution of people’s trustworthiness (or the ways in which people are untrustworthy), I will actually make worse decisions about which things to prioritize in AI security. You seem to believe the converse, but I doubt you have good reasons to think that.
On the contrary, the fact that all attendees are supposed to leave in silence if the button is pressed suggests that the risk is meant to be real and to have consequences.
Yes. Pressing the button makes life worse for your companions, which is the basic reason that you shouldn’t do it.
I genuinely, sincerely appreciate that you took the time to make this all explicit, and I think you assumed a more-than-reasonable amount of good faith on my part given how lathered up I was and how hard it is to read tone on the Internet.
I think the space we are talking across is “without checking if this will make my beliefs more accurate.” Accuracy entails “what do I think is true” but also “how confident am I that it’s true”. Persuasion entails that plus “will this persuasion strategy actually make their beliefs more accurate”. In hindsight, I should have communicated why I thought what I proposed would make people’s beliefs about humanity more accurate.
However, the response to my comments made me less confident that the intervention would be effective at making those beliefs more accurate. Plus, given the context, you had little reason to assume that my truth+confidence calculation was well-calibrated.
There’s also the question of whether the expected value of button-pressing exceeds the expected life-worsening, and how confident a potential button-presser is in their answer and the magnitude of the exceeding. I do think that’s a fair challenge to your final thought.
I think every single part of this exercise, and all of the responses to it, have been extremely informative.
The priors of hundreds of thoughtful people have been updated substantially.
Others learned that they cared about something (either the site or the Petrov Day button) more than they thought they did, and, being thoughtful, learned something about themselves by thinking about why they cared so much.
At least one person (chris) received an object lesson in security that I doubt ze will forget.
The rest of us had the opportunity to learn that lesson as well, although lessons not learned in blood are soon forgotten, and for a given individual, “losing the front page for one day” is far less blood than what it cost chris.
I updated my priors about myself. If you’d asked me in 2019, after the successful no-press, whether I would have pressed it with codes, I probably would have said no. However, seeing the amount of thoughtfulness generated by what went down in 2020, if I get codes in 2021, I will give serious thought to pressing the button because doing so:
will, no matter the reason, instill even further caution and security-paranoia in AI researchers, which is stressful for them but beneficial for humanity.
causes the community to learn something about itself which can’t be predicted in advance but is almost certainly accurate.
shows that if it’s something I would consider, it’s something certain types of AI would consider, too, if we’re not careful. AI risk researchers are aware of that problem, but pressing the button gives one the opportunity to make them feel it in their guts.
I wonder if there is instrumental value in pre-committing to being a Petrov Day ruiner unless certain types of conditions are met. “I am coming to the San Francisco Petrov Day 2022 celebration and will press the button five minutes before the end of the ceremony unless I first come to understand how we can solve the problem of whether an autonomous vehicle should swerve, and thereby kill the 18-year-old pregnant valedictorian, or not swerve, and thereby kill the 84-year-old semi-retired Medicin Sans Frontieres who hand-carves toys for kids at the orphanage.”
The reason that it induces stress to see the website go down is that it tells me something about the external world. Please do not hack my means of understanding the external world to induce mental states in me that you assume will be beneficial to humanity.
Very much this. Indeed, creating common knowledge that the people around me will be able to wield potentially destructive power without trying to leverage that power into taking resources away from me, or trying to force me to change my mind on something, is one of the things I want out of Petrov Day, since it’s definitely a thing I am worried about.
I understand and sympathize with the desire to know that people around you can hold that power without abusing it. I would also like to know that.
But it’s only ever a belief about the average behavior of the people in the community. It should update when new information becomes available. The button is a test of your belief. Each decision made by each person to press or not to press the button is information that should feed your model of how probable it is that people can hold the power without abusing it. A bunch of people staring at a red button with candles lit nearby can be an iterated Prisoner’s Dilemma depending on the participants’ utility functions.
If I decide not to Petrov-Ruin out of a desire to protect your belief that people can hold the power without abusing it, and I make that change because I care about you and your suffering as a fellow human and think your life will be much worse if my actions demolish that belief, then a successful Petrov Day is at risk of becoming another example of Goodhart’s Law.
I think, anyway. Sometimes my prose comes off as aggressive when I’m just trying to engage with thoughtful people. I swear, on SlateStarCodex’s review of Surfing Uncertainty, that I’m typing in good faith and could have my mind changed on these issues.
TBC, I think you’re supposed to not Petrov-ruin so as to not be destructive (or to leverage your destructive power to modify habryka to be more like you’d like them to be). My interpretation of habryka is that it would be nice if (a) it were actually true that this community could wield destructive power without being destructive etc and (b) everybody knew that. The problem with wielding destructive power is that it makes (a) false, not just that it makes (b) false.
It sounds like we draw the line differently on the continuum between persuasion and brain hacking. I’d like to hear more about why you think some or all parts of this are
ishacking so I can calibrate my “I’m probably not a sociopath” prior.Or maybe we are diverging on what things one can legitimately claim are the purposes of Petrov Day. If the purpose of an in-person celebration is “Provide a space for community members to get together, break bread, and contemplate, and the button’s raises the tension high enough that it feels like something is at stake but there’s no serious risk that the button will be pressed,” then I’m wrong, I’m being cruel, and some other forum is more appropriate for my Ruiner to raise zir concern. But I don’t get the sense that there’s a community consensus on this.
On the contrary, the fact that all attendees are supposed to leave in silence if the button is pressed suggests that the risk is meant to be real and to have consequences.
I don’t think that you doing this would be “brain hacking”. But your plan to press the button in order to make me be more cautious and paranoid works roughly like this: you would decide to press the button, so as to cause me to believe that the world is full of untrustworthy people, so that I make different decisions. Here’s my attempt to break down my complaint about this:
You are trying to manipulate my beliefs to change my actions, without checking if this will make my beliefs more accurate.
Your manipulation of my beliefs will cause them to be inaccurate: I will probably believe that the world is contains careless people, or people with a reckless disregard for the website staying up. But actually what’s going on is the world contains people who want me to be paranoid.
To the extent that I do figure out what’s going on and do have true beliefs, then you’re just choosing whether I can have accurate beliefs in the world where things are bad, vs having accurate beliefs in the world where things are good. But it’s better for things to be good than bad.
If I have wrong beliefs about the distribution of people’s trustworthiness (or the ways in which people are untrustworthy), I will actually make worse decisions about which things to prioritize in AI security. You seem to believe the converse, but I doubt you have good reasons to think that.
Yes. Pressing the button makes life worse for your companions, which is the basic reason that you shouldn’t do it.
I genuinely, sincerely appreciate that you took the time to make this all explicit, and I think you assumed a more-than-reasonable amount of good faith on my part given how lathered up I was and how hard it is to read tone on the Internet.
I think the space we are talking across is “without checking if this will make my beliefs more accurate.” Accuracy entails “what do I think is true” but also “how confident am I that it’s true”. Persuasion entails that plus “will this persuasion strategy actually make their beliefs more accurate”. In hindsight, I should have communicated why I thought what I proposed would make people’s beliefs about humanity more accurate.
However, the response to my comments made me less confident that the intervention would be effective at making those beliefs more accurate. Plus, given the context, you had little reason to assume that my truth+confidence calculation was well-calibrated.
There’s also the question of whether the expected value of button-pressing exceeds the expected life-worsening, and how confident a potential button-presser is in their answer and the magnitude of the exceeding. I do think that’s a fair challenge to your final thought.
Thanks again.
Please don’t threaten to actively destroy things I care about in order to get what you want from me.
I’m missing something fundamental here given that this is your view as organizer, so I’m going to go back to lurking for a while.