However, seeing the amount of thoughtfulness generated by what went down in 2020, if I get codes in 2021, I will give serious thought to pressing the button because doing so will, no matter the reason, instill even further caution and security-paranoia in AI researchers, which is stressful for them but beneficial for humanity.
The reason that it induces stress to see the website go down is that it tells me something about the external world. Please do not hack my means of understanding the external world to induce mental states in me that you assume will be beneficial to humanity.
Very much this. Indeed, creating common knowledge that the people around me will be able to wield potentially destructive power without trying to leverage that power into taking resources away from me, or trying to force me to change my mind on something, is one of the things I want out of Petrov Day, since it’s definitely a thing I am worried about.
I understand and sympathize with the desire to know that people around you can hold that power without abusing it. I would also like to know that.
But it’s only ever a belief about the average behavior of the people in the community. It should update when new information becomes available. The button is a test of your belief. Each decision made by each person to press or not to press the button is information that should feed your model of how probable it is that people can hold the power without abusing it. A bunch of people staring at a red button with candles lit nearby can be an iterated Prisoner’s Dilemma depending on the participants’ utility functions.
If I decide not to Petrov-Ruin out of a desire to protect your belief that people can hold the power without abusing it, and I make that change because I care about you and your suffering as a fellow human and think your life will be much worse if my actions demolish that belief, then a successful Petrov Day is at risk of becoming another example of Goodhart’s Law.
I think, anyway. Sometimes my prose comes off as aggressive when I’m just trying to engage with thoughtful people. I swear, on SlateStarCodex’s review of Surfing Uncertainty, that I’m typing in good faith and could have my mind changed on these issues.
If I decide not to Petrov-Ruin out of a desire to protect your belief that people can hold the power without abusing it, and I make that change because I care about you and your suffering as a fellow human and think your life will be much worse if my actions demolish that belief, then a successful Petrov Day is at risk of becoming another example of Goodhart’s Law.
TBC, I think you’re supposed to not Petrov-ruin so as to not be destructive (or to leverage your destructive power to modify habryka to be more like you’d like them to be). My interpretation of habryka is that it would be nice if (a) it were actually true that this community could wield destructive power without being destructive etc and (b) everybody knew that. The problem with wielding destructive power is that it makes (a) false, not just that it makes (b) false.
It sounds like we draw the line differently on the continuum between persuasion and brain hacking. I’d like to hear more about why you think some or all parts of this are is hacking so I can calibrate my “I’m probably not a sociopath” prior.
Or maybe we are diverging on what things one can legitimately claim are the purposes of Petrov Day. If the purpose of an in-person celebration is “Provide a space for community members to get together, break bread, and contemplate, and the button’s raises the tension high enough that it feels like something is at stake but there’s no serious risk that the button will be pressed,” then I’m wrong, I’m being cruel, and some other forum is more appropriate for my Ruiner to raise zir concern. But I don’t get the sense that there’s a community consensus on this.
On the contrary, the fact that all attendees are supposed to leave in silence if the button is pressed suggests that the risk is meant to be real and to have consequences.
I’d like to hear more about why you think some or all parts of this are hacking so I can calibrate my “I’m probably not a sociopath” prior.
I don’t think that you doing this would be “brain hacking”. But your plan to press the button in order to make me be more cautious and paranoid works roughly like this: you would decide to press the button, so as to cause me to believe that the world is full of untrustworthy people, so that I make different decisions. Here’s my attempt to break down my complaint about this:
You are trying to manipulate my beliefs to change my actions, without checking if this will make my beliefs more accurate.
Your manipulation of my beliefs will cause them to be inaccurate: I will probably believe that the world is contains careless people, or people with a reckless disregard for the website staying up. But actually what’s going on is the world contains people who want me to be paranoid.
To the extent that I do figure out what’s going on and do have true beliefs, then you’re just choosing whether I can have accurate beliefs in the world where things are bad, vs having accurate beliefs in the world where things are good. But it’s better for things to be good than bad.
If I have wrong beliefs about the distribution of people’s trustworthiness (or the ways in which people are untrustworthy), I will actually make worse decisions about which things to prioritize in AI security. You seem to believe the converse, but I doubt you have good reasons to think that.
On the contrary, the fact that all attendees are supposed to leave in silence if the button is pressed suggests that the risk is meant to be real and to have consequences.
Yes. Pressing the button makes life worse for your companions, which is the basic reason that you shouldn’t do it.
I genuinely, sincerely appreciate that you took the time to make this all explicit, and I think you assumed a more-than-reasonable amount of good faith on my part given how lathered up I was and how hard it is to read tone on the Internet.
I think the space we are talking across is “without checking if this will make my beliefs more accurate.” Accuracy entails “what do I think is true” but also “how confident am I that it’s true”. Persuasion entails that plus “will this persuasion strategy actually make their beliefs more accurate”. In hindsight, I should have communicated why I thought what I proposed would make people’s beliefs about humanity more accurate.
However, the response to my comments made me less confident that the intervention would be effective at making those beliefs more accurate. Plus, given the context, you had little reason to assume that my truth+confidence calculation was well-calibrated.
There’s also the question of whether the expected value of button-pressing exceeds the expected life-worsening, and how confident a potential button-presser is in their answer and the magnitude of the exceeding. I do think that’s a fair challenge to your final thought.
The reason that it induces stress to see the website go down is that it tells me something about the external world. Please do not hack my means of understanding the external world to induce mental states in me that you assume will be beneficial to humanity.
Very much this. Indeed, creating common knowledge that the people around me will be able to wield potentially destructive power without trying to leverage that power into taking resources away from me, or trying to force me to change my mind on something, is one of the things I want out of Petrov Day, since it’s definitely a thing I am worried about.
I understand and sympathize with the desire to know that people around you can hold that power without abusing it. I would also like to know that.
But it’s only ever a belief about the average behavior of the people in the community. It should update when new information becomes available. The button is a test of your belief. Each decision made by each person to press or not to press the button is information that should feed your model of how probable it is that people can hold the power without abusing it. A bunch of people staring at a red button with candles lit nearby can be an iterated Prisoner’s Dilemma depending on the participants’ utility functions.
If I decide not to Petrov-Ruin out of a desire to protect your belief that people can hold the power without abusing it, and I make that change because I care about you and your suffering as a fellow human and think your life will be much worse if my actions demolish that belief, then a successful Petrov Day is at risk of becoming another example of Goodhart’s Law.
I think, anyway. Sometimes my prose comes off as aggressive when I’m just trying to engage with thoughtful people. I swear, on SlateStarCodex’s review of Surfing Uncertainty, that I’m typing in good faith and could have my mind changed on these issues.
TBC, I think you’re supposed to not Petrov-ruin so as to not be destructive (or to leverage your destructive power to modify habryka to be more like you’d like them to be). My interpretation of habryka is that it would be nice if (a) it were actually true that this community could wield destructive power without being destructive etc and (b) everybody knew that. The problem with wielding destructive power is that it makes (a) false, not just that it makes (b) false.
It sounds like we draw the line differently on the continuum between persuasion and brain hacking. I’d like to hear more about why you think some or all parts of this are
ishacking so I can calibrate my “I’m probably not a sociopath” prior.Or maybe we are diverging on what things one can legitimately claim are the purposes of Petrov Day. If the purpose of an in-person celebration is “Provide a space for community members to get together, break bread, and contemplate, and the button’s raises the tension high enough that it feels like something is at stake but there’s no serious risk that the button will be pressed,” then I’m wrong, I’m being cruel, and some other forum is more appropriate for my Ruiner to raise zir concern. But I don’t get the sense that there’s a community consensus on this.
On the contrary, the fact that all attendees are supposed to leave in silence if the button is pressed suggests that the risk is meant to be real and to have consequences.
I don’t think that you doing this would be “brain hacking”. But your plan to press the button in order to make me be more cautious and paranoid works roughly like this: you would decide to press the button, so as to cause me to believe that the world is full of untrustworthy people, so that I make different decisions. Here’s my attempt to break down my complaint about this:
You are trying to manipulate my beliefs to change my actions, without checking if this will make my beliefs more accurate.
Your manipulation of my beliefs will cause them to be inaccurate: I will probably believe that the world is contains careless people, or people with a reckless disregard for the website staying up. But actually what’s going on is the world contains people who want me to be paranoid.
To the extent that I do figure out what’s going on and do have true beliefs, then you’re just choosing whether I can have accurate beliefs in the world where things are bad, vs having accurate beliefs in the world where things are good. But it’s better for things to be good than bad.
If I have wrong beliefs about the distribution of people’s trustworthiness (or the ways in which people are untrustworthy), I will actually make worse decisions about which things to prioritize in AI security. You seem to believe the converse, but I doubt you have good reasons to think that.
Yes. Pressing the button makes life worse for your companions, which is the basic reason that you shouldn’t do it.
I genuinely, sincerely appreciate that you took the time to make this all explicit, and I think you assumed a more-than-reasonable amount of good faith on my part given how lathered up I was and how hard it is to read tone on the Internet.
I think the space we are talking across is “without checking if this will make my beliefs more accurate.” Accuracy entails “what do I think is true” but also “how confident am I that it’s true”. Persuasion entails that plus “will this persuasion strategy actually make their beliefs more accurate”. In hindsight, I should have communicated why I thought what I proposed would make people’s beliefs about humanity more accurate.
However, the response to my comments made me less confident that the intervention would be effective at making those beliefs more accurate. Plus, given the context, you had little reason to assume that my truth+confidence calculation was well-calibrated.
There’s also the question of whether the expected value of button-pressing exceeds the expected life-worsening, and how confident a potential button-presser is in their answer and the magnitude of the exceeding. I do think that’s a fair challenge to your final thought.
Thanks again.