[EDIT: two people with codes below have objected, so I’m not up for this trade anymore, unless we figure out a way to make a broader poll]
I have launch codes. Would anyone be interested in offering counterfactual donations to https://www.givewell.org/charities/amf? I could also be interested in counterfactual donations to nuclear war-prevention organizations.
Since the day is drawing to a close and at this point I won’t get to do the thing I wanted to do, here are some scattered thoughts about this thing.
First, my plan upon obtaining the code was to immediately repeat Jeff’s offer. I was curious how many times we could iterate this; I had in fact found another person who was potentially interested in being another link in this chain (and who was also more interested in repeating the offer than nuking the site). I told Jeff this privately but didn’t want to post it publicly (reasons: thought it would be more fun if this was a surprise; didn’t think people should put that much weight on my claimed intentions anyway; thought it was valuable for the conversation to proceed as though nuking were the likely outcome).
(In the event that nobody took me up on the offer, I still wasn’t going to nuke the site.)
Other various thoughts:
Having talked to some people who take this exercise very seriously indeed and some who don’t understand why anyone takes it seriously at all, both perspectives make a lot of sense to me and yet I’m having trouble explaining either one to the other. Probably I should practice passing some ITTs.
Of the arguments raised against the trade the one that I am the most sympathetic to is TurnTrout’s argument that it’s actually very important to hold to the important principles even when there’s a naive utilitarian argument in favor of abandoning them. I agree very strongly with this idea.
But it also seems to me there’s a kind of… mixing levels here? The tradeoff here is between something symbolic and something very real. I think there’s a limit to the extent this is analogous to, like, “maintain a bright line against torture even when torture seems like the least bad choice”, which I think of as the canonical example of this idea.
(I realize some people made arguments that this symbolic thing is actually reflective or possibly determinative of probabilistic real consequences (in which case the “mixing levels” point above is wrong). (Possibly even the arguments that didn’t state this explicitly relied on the implication of this?) I guess I just…. don’t find that very persuasive, because, again, the extent to which this exercise is analogous to anything of real-world importance is pretty limited; the vast majority of people who would nuke LW for shits and giggles wouldn’t also nuke the world for shits and giggles. Rituals and intentional exercises like these have any power but I think I put less stock in them than some.)
Relatedly, I guess I feel like if the LW devs wanted me to take this more seriously they should’ve made it have actual stakes; having just the front page go down for just 24 hours is just not actually destroying something of real value. (I don’t mean to insult the devs or even the button project—I think this has been pretty great actually—it’s just great in more of a “this is a fun stunt/valuable discussion starter” way than a “oh shit this is a situation where trustworthiness and reliability matter” way. (I realize that doing this in a way that had stakes would have possibly been unacceptably risky; I don’t really know how to calibrate the stakes such that they both matter and are an acceptable risk.))
Nevertheless I am actually pleased that we’ve made it through (most of) the day without the site going down (even when someone posted (what they claim is) their code on Facebook).
I am more pleased than that about the discussions that have happened here. I think the discussions would have been less active and less good without a specific actual possible deal on the table, so I’m glad to have spurred a concrete proposal which I think helped pin down some discussion points that would have remained nebulous or just gone unsaid otherwise.
If in fact the probability of someone nuking the site is entangled with the probability of someone nuking the world (or similar), I think it’s much more likely that both share common causes than that one causes the other. If this is so, then gaining more information about where we stand is valuable even if it involves someone nuking the site (perhaps especially then?).
In general I think a more eventful Petrov Day is probably more valuable and informative than a less eventful one.
I would like to add that I think this is bad (and have the codes). We are trying to build social norms around not destroying the world; you are blithely defecting against that.
I thought you were threatening extortion. As it is, given that people are being challenged to uphold morality, this response is still an offer to throw that away in exchange for money, under the claim that it’s moral because of some distant effect. I’d encourage you to follow Jai’s example and simply delete your launch codes.
I would give someone my launch codes in exchange for a sufficiently large counterfactual donation.
I haven’t thought seriously about how large it would need to be, because I don’t expect someone to take me up on this, but if you’re interested we can talk.
I think the better version of this strategy would involve getting competing donations from both sides, using some weighting of total donations for/against pushing the button to set a probability of pressing the button, and tweaking the weighting of the donations such that you expect the probability of pressing the button will be low (because pressing the button threatens to lower the probability of future games of this kind, this is an iterated game rather than a one-shot).
Nooooo you’re a good person but you’re promoting negotiating with terrorists literally boo negative valence emotivism to highlight third-order effects, boo, noooooo................
Participants were selected based on whether they seem unlikely to press the button, so whoever would have cared about future extortions being possible CDT-doesn’t need to, because they won’t be a part of it.
Maybe a fair value would be GiveWell’s best guess cost per life saved equivalent? [1] There’s some harm in releasing the codes entrusted to me, but not so much that it’s better for someone to die.
I would want your assurance that it really was a counterfactually valid donation, though: money you would otherwise spend selfishly, and that you would not consider part of your altruistic impact on the world.
If two other people with launch codes tell me they don’t think this is a good trade then I’ll retract the offer.
Did you consider the unilateralist curse before making this comment?
Do you consider it to be a bad idea if you condition the assumption that only one other person with launch access who sees this post in the time window choose to say it was a bad idea?
Is the objection over the amount (there’s a higher number where it would be a good trade), being skeptical of the counterfactuality of the donation (would the money really be spent fully selfishly?), or something else?
(others have said part of what I wanted to say, but didn’t quite cover the thing I was worried about)
I see two potential objections:
how valuable is trust among LW users? (this is hard to quantify, but I think it is potentially quite high)
how persuasive should “it’s better than for someone to die” type arguments.
My immediate thoughts are mostly about the second argument.
I think it’s quite dangerous to leave oneself vulnerable to the second argument (for reasons Julia discusses on givinggladly.com in various posts). Yes, you can reflect upon whether every given cup of coffee is worth the dead-child-currency it took to buy it. But taken naively this is emotionally cognitively exhausting. (It also pushes people towards a kind of frugality that isn’t actually that beneficial). The strategy of “set aside a budget for charity, based on your values, and don’t feel pressure to give more after that” seems really important for living sanely while altruistic.
(I don’t have a robustly satisfying answer on how to deal with that exactly, but see this comment of mine for some more expanded thoughts of mine on this)
Now, additional counterfactual donations still seem fine to be willing to make on the fly – I’ve derived fuzzy-pleasure-joy from donating based on weird schemes on the Dank EA Memes FB group. But I think it is quite dangerous to feel pressure to donate to weird Dank EA Meme schemes based on “a life is at stake.”
A life is always at stake. I don’t think most humans can or should live this way.
The strategy of “set aside a budget for charity, based on your values, and don’t feel pressure to give more after that” seems really important for living sanely while altruistic.
But this situation isn’t like that.
I agree you don’t want to always be vulnerable to the second argument, for the reasons you give. I don’t think the appropriate response is to be so hard-set in your ways that you can’t take advantage of new opportunities that arise. You can in fact compare whether or not a particular trade is worth it if the situation calls for it, and a one-time situation that has an upside of $1672 for ~no work seems like such a situation.
As a meta point directed more at the general conversation than this comment in particular, I would really like it if people stated monetary values at which they would think this was a good idea. At $10, I’m at “obviously not”, and at $1 million, I’m at “obviously yes”. I think the range of uncertainty is something like $500 - $20,000. Currently it feels like the building of trust is being treated as a sacred value; this seems bad.
My sense is that it’s very unlikely to be worth it at anything below $10k, and I might be a bit tempted at around $50k, though still quite hesitant. I agree that at $1M it’s very likely worth it.
Firm disagree. Second-order and third-order effects go limit->infinity here.
Also btw, I’m running a startup that’s now looking at — best case scenario — handling significant amounts of money over multiple years.
It makes me realize that “a lot of money” on the individual level is a terrible heuristic. Seriously, it’s hard to get one’s mind around it, but a million dollars is decidedly not a lot of money on the global scale.
For further elaboration, this is relevant and incredibly timely:
LW frontpage going down is also not particularly bad, so you don’t need much money to compensate for it.
If you wanted to convince me, you could make a case that destroying trust is really bad, and that in this particular case pressing the button would destroy a lot of trust, but that case hasn’t really been made.
LW frontpage going down is also not particularly bad [...] If you wanted to convince me, you could make a case that destroying trust is really bad
Umm, respectfully, I think this is extremely arrogant. Dangerously so.
Anyways, I’m being blunt here, but I think respectful and hopefully useful. Think about this. Reasoning follows —
The instructions if you got launch codes (also in the above post) were as such (emphasis added with underline) —
“Every Petrov Day, we practice not destroying the world. One particular way to do this is to practice the virtue of not taking unilateralist action.
It’s difficult to know who can be trusted, but today I have selected a group of LessWrong users who I think I can rely on in this way. You’ve all been given the opportunity to show yourselves capable and trustworthy.
[...]
This Petrov Day, between midnight and midnight PST, if you, {{username}}, enter the launch codes below on LessWrong, the Frontpage will go down for 24 hours.
I hope to see you on the other side of this, with our honor intact.”
So, to Ben Pace at least (the developer who put in a tremendous amount of hours and thought into putting this together), it represents...
*”practicing not destroying the world”
*”practicing the virtue of not taking unilateralist action”
*implications around his own uncertainty of who to trust
*de facto for Ben that he can’t rely on you personally, by his standards, if you do it
*showing yourself not “capable and trustworthy” by his standards
*having the total group’s “honor” “not be intact”, under Ben’s conception
And you want me to make a case for you on a single variable while ignoring the rather clear and straightforward written instructions for your own simple reductive understanding?
For Ben at least, the button thing was a symbolic exercise analogous to not nuking another country and he specifically asked you not to and said he’s trusting you.
So, no, I don’t want to “convince you” nor “make a case that destroying trust is really bad.” You’re literally stating you should set the burden of proof and others should “make a case.”
In an earlier comment you wrote,
You can in fact compare whether or not a particular trade is worth it if the situation calls for it, and a one-time situation that has an upside of $1672 for ~no work seems like such a situation.
“No work”? You mean aside from the work that Ben and the team did (a lot) and demonstrating to the world at large that the rationality community can’t press a “don’t destroy our own website” button to celebrate a Soviet soldier who chose restraint?
I mean, I don’t even want to put numbers on it, but if we gotta go to “least common denominator”, then $1672 is less than a week’s salary of the median developer in San Francisco. You’d be doing a hell of a lot more damage than that to morale and goodwill, I reckon, among the dev team here.
To be frank, I think the second-order and third-order effects of this project going well on Ben Pace alone is worth more than $1672 in “generative goodness” or whatever, and the potential disappointment and loss of faith in people he “thinks but is uncertain he can rely upon and trust” is… I mean, you know that one highly motivated person leading a community can make an immense difference right?
Just so you can get $1672 for charity (“upside”) with “~no work”?
And that’s just productivity, ignoring any potential negative affect or psychological distress, and being forced to reevaluate who he can trust. I mean, to pick a more taboo example, how many really nasty personal insults would you shout at a random software developer for $1672 to charity? That’s almost “no work” — it’s just you shouting some words, and whatever trivial psychological distress they feel, and I wager getting random insults from a stranger is much lower than having people you “are relying on and trusting” press a “don’t nuke the world simulator button.”
Like, if you just read what Ben wrote, you’d realize that risking destroying goodwill and faith in a single motivated innovative person alone should be priced well over $20k. I wouldn’t have done it for $100M going to charity. Seriously.
If you think that’s insane, stop and think why our numbers are four orders of magnitude apart — our priors must be obviously very different. And based on the comments, I’m taking into account more things than you, so you might be missing something really important.
(I could go on forever about this, but here’s one more: what’s the difference in your expected number of people discovering and getting into basic rationality, cognitive biases, and statistics with pressing the “failed at ‘not destroying the world day’ commemoration” vs not? Mine: high. What’s the value of more people thinking and acting rationally? Mine: high. So multiply the delta by the value. That’s just one more thing. There’s a lot you’re missing. I don’t mean this disrespectfully, but maybe think more instead of “doing you” on a quick timetable?)
(Here’s another one you didn’t think about: we’re celebrating a Soviet engineer. Run this headline in a Russian newspaper: “Americans try to celebrate Stanislav Petrov by not pressing ‘nuke their own website’ button, arrogant American pushes button because money isn’t donated to charity.”)
(Here’s another one you didn’t think about: I’ll give anyone 10:1 odds this is cited in a mainstream political science journal within 15 years, which are read by people who both set and advise on policy, and that “group of mostly American and European rationalists couldn’t not nuke their own site” absolutely is the type of thing to shape policy discussions ever-so-slightly.)
(Here’s another one you didn’t think about: some fraction of the people here are active-duty or reserve military in various countries. How does this going one way or another shape their kill/no-kill decisions in ambiguous warzones? Have you ever read any military memoirs about people who made to make those calls quickly, EX overwatch snipers in Mogadishu? No?)
(Not meant to be snarky — Please think more and trust your own intuition less.)
Thanks for writing this up. It’s pretty clear to me that you aren’t modeling me particularly well, and that it would take a very long time to resolve this, which I’m not particularly willing to do right now.
I’ll give anyone 10:1 odds this is cited in a mainstream political science journal within 15 years, which are read by people who both set and advise on policy
I’ll take that bet. Here’s a proposal: I send you $100 today, and in 15 years if you can’t show me an article in a reputable mainstream political science journal that mentions this event, then you send me an inflation-adjusted $1000. This is conditional on finding an arbiter I trust (perhaps Ben) who will:
Adjudicate whether it is an “article in a reputable mainstream political science journal that mentions this event”
Compute the inflation-adjusted amount, should that be necessary
Vouch that you are trustworthy and will in fact pay in 15 years if I win the bet.
If you wanted to convince me, you could make a case that destroying trust is really bad, and that in this particular case pressing the button would destroy a lot of trust, but that case hasn’t really been made.
That this particular case would destroy a lot of trust.
This seemed to me like a fun game with stakes of social disapproval on one side, and basically no stakes on the other. This doesn’t seem like it has much bearing on the trustworthiness of members of the rationality community in situations with real stakes, where there is a stronger temptation to defect, or it would have more of a cost on the community.
I guess implicit to what I’m saying is that the front page being down for 24 hours doesn’t seem that bad to me. I don’t come to Less Wrong most days anyway.
But this is not a one-time situation. If you’re a professional musician, would you agree to mess up at every dress rehearsal, because it isn’t the real show?
More indirectly… the whole point of “celebrating and practicing our ability to not push buttons” is that we need to be able to not push buttons, even when it seems like a good idea (or necessary, or urgent that we defect while we can still salvage the the percieved situation). The vast majority of people aren’t tempted by pushing a button when pushing it seems like an obviously bad idea. I think we need to take trust building seriously, and practice the art of actually cooperating. Real life doesn’t grade you on how well you understand TDT considerations and how many blog posts you’ve read on it, it grades you on whether you actually can make the cooperation equilibrium happen.
Rohin argues elsewhere for taking a vote (at least in principal). If 50% vote in favor, then he has successfully avoided “falling into the unilateralist’s curse” and has gotten $1.6k for AMF. He even has some bonus for “solved the unilateralist’s curse in a way that’s not just “sit on his hands”. Now, it’s probably worth subtracting points for “the LW team asked them not to blow up the site and the community decided to anyway.” But I’d consider it fair play.
If you’re a professional musician, would you agree to mess up at every dress rehearsal, because it isn’t the real show?
Depends on the upside.
I think we need to take trust building seriously, and practice the art of actually cooperating.
This comment of mine was meant to address the claim “people shouldn’t be too easily persuaded by arguments about people dying” (the second claim in Raemon’s comment above). I agree that intuitions like this should push up the size of the donation you require.
More indirectly… the whole point of “celebrating and practicing our ability to not push buttons” is that we need to be able to not push buttons, even when it seems like a good idea (or necessary, or urgent that we defect while we can still salvage the the percieved situation). The vast majority of people aren’t tempted by pushing a button when pushing it seems like an obviously bad idea.
As jp mentioned, I think the ideal thing to do is: first, each person figures out whether they personally think the plan is positive / negative, and then go with the majority opinion. I’m talking about the first step here. The second step is the part where you deal with the unilateralist curse.
Real life doesn’t grade you on how well you understand TDT considerations and how many blog posts you’ve read on it, it grades you on whether you actually can make the cooperation equilibrium happen.
It seems to me like the algorithm people are following is: if an action would be unilateralist, and there could be disagreement about its benefit, don’t take the action. This will systematically bias the group towards inaction. While this is fine for low-stakes situations, in higher-stakes situations where the group can invest effort, you should actually figure out whether it is good to take the action (via the two-step method above). We need to be able to take irreversible actions; the skill we should be practicing is not “don’t take unilateralist actions”, it’s “take unilateralist actions only if they have an expected positive effect after taking the unilateralist curse into account”.
We never have certainty, not for anything in this world. We must act anyway, and deciding not to act is also a choice. (Source)
It seems to me like the algorithm people are following is: if an action would be unilateralist, and there could be disagreement about its benefit, don’t take the action. This will systematically bias the group towards inaction. While this is fine for low-stakes situations, in higher-stakes situations where the group can invest effort, you should actually figure out whether it is good to take the action (via the two-step method above). We need to be able to take irreversible actions; the skill we should be practicing is not “don’t take unilateralist actions”, it’s “take unilateralist actions only if they have an expected positive effect after taking the unilateralist curse into account”.
I don’t disagree with this, and am glad to see reminders to actually evaluate different courses of action besides the one expected of us. my comment was more debating your own valuation as being too low, it not being a one-off event once you consider scenarios either logically or causally downstream of this one, and just a general sense that you view the consequences of this event as quite isolated.
my comment was more debating your own valuation as being too low, it not being a one-off event once you consider scenarios either logically or causally downstream of this one
That makes sense. I don’t think I’m treating it as a one-off event; it’s more that it doesn’t really seem like there’s much damage to the norm. If a majority of people thought it was better to take the counterfactual donation, it seems like the lesson is “wow, we in fact can coordinate to make good decisions”, as opposed to “whoops, it turns out rationalists can’t even coordinate on not nuking their own site”.
jkaufman’s initial offer was unclear. I read it (incorrectly) as “I will push the button (/release the codes) unless someone gives AMF $1672 counterfactually”, not as “if someone is willing to pay me $1672, I will give them the codes”. Read in the first way, Raemon’s concerns about “pressure” as opposed to additional donations made on the fly may be clearer; it’s not about jkaufman’s opportunity to get $1672 in donations for no work, it’s about everyone else being extorted for an extra $1672 to preserve their values.
Perhaps a nitpick, but I feel like the building of trust is being treated less as a sacred value, and more as a quantity of unknown magnitude, with some probability that that magnitude could be really high (at least >$1672, possibly orders of magnitude higher). Doing a Fermi is a trivial inconvenience that I for one cannot handle right now; since it is a weekday, maybe others feel much the same.
I agree that your comment takes this (very reasonable) perspective. It didn’t seem to me like any other comment was taking this perspective, but perhaps that was their underlying model.
Why doesn’t the United States threaten to nuke everyone if they don’t give a very reasonable 20% of their GDP per year to fund X-Risk — or whatever your favorite worthwhile projects are?
Screw it, why don’t we set the bar at 1%?
Imagine you’re advising the U.S. President (it’s Donald Trump right now, incidentally). Who should President Trump threaten with nuking if they don’t pay up to fund X-Risk? How much?
Now, let’s say 193 countries do it, and $X trillion is coming in and doing massive good.
Only Switzerland and North Korea defect. What do you do? Or rather, what do you advise Donald Trump to do?
Note to self: Does lighthearted dark humor highlighting risk increase or decrease chances of bad things happening?
Initial speculation: it might have an inverted response curve. One or two people making the joke might increase gravity, everyone joking about it might change norms and salience.
I noticed after playing a bunch of games of a mafia-type game with some rationalists that when people made edgy jokes about being in the mob or whatever, they were more likely to end up actually being in the mob.
(I have launch codes and am happy to prove it to you if you want.)
Hmmm, I feel like the argument “There’s some harm in releasing the codes entrusted to me, but not so much that it’s better for someone to die” might prove too much? Like, death is really bad, I definitely grant that. But despite the dollar amount you gave, I feel like we’re sort of running up against a sacred value thing. I mean, you could just as easily say, “There’s some harm in releasing the codes entrusted to me, but not so much that it’s better for someone to have a 10% chance of dying”—which would naïvely bring your price down to $167.20.
If you accept as true that that argument should be equally ‘morally convincing’, then you end up in a position where the only reasonable thing to do is to calculate exactly how much harm you actually expect to be done by you pressing the button. I’m not going to do this because I’m at work and it seems complicated (what is the disvalue of harm to the social fabric of an online community that’s trying to save the world, and operates largely on trust? perhaps it’s actually a harmless game, but perhaps it’s not, hard to know—seems like the majority of effects would happen down the line).
Additionally, I could just counter-offer a $1,672 counterfactual donation to GiveWell for you to not press the button. I’m not committing to do this, but I might do so if it came down to it.
I’m leaning towards this not being a good trade, even though it’s taxing to type that.
In the future, some people will find themselves in situations not too unlike this, where there are compelling utilitarian reasons for pressing the button.
Look, the system should be corrigible. It really, really should; the safety team’s internal prediction market had some pretty lopsided results. There are untrustworthy actors with capabilities similar to or exceeding ours. If we press the button, it probably goes better than if they press it. And they can press it. Twenty people died since I started talking, more will die if we don’t start pushing the world in a better direction, and do you feel the crushing astronomical weight of the entire future’s eyes upon us? Even a small probability increase in a good outcome makes pressing the button worth it.
And I think your policy should still be to not press the button to launch a singleton from this epistemic state, because we have to be able to cooperate! You don’t press buttons at will, under pressure, when the entire future hangs in the balance! If we can’t even cooperate, right here, right now, under much weaker pressures, what do we expect of the “untrustworthy actors”?
So how about people instead donate to charity in celebration of not pressing the button?
Oh: and to give those potential other people time to object, I won’t accept an offer before 2hr from when I posted the parent comment (4:30 Boston time)
The normal way to resolve unilateralist curse effects is to see how many people agree / disagree, and go with the majority. (Even if the action is irreversible, as long as everyone knows that and has taken that into account, going with the majority seems fine.)
Pro: it saves an expected life. Con: LW frontpage probably goes down for a day. Con: It causes some harm to trust. Pro: It reinforces the norm of actually considering consequences, and not holding any value too sacred.
Overall I lean towards the benefits outweighing the costs, so I support this offer.
Pro: It reinforces the norm of actually considering consequences, and not holding any value too sacred.
Not an expert here, but my impression was sometimes it can be useful to have “sacred values” in certain decision-theoretic contexts (like “I will one-box in Newcomb’s Problem even if consequentialist reasoning says otherwise”?) If I had to choose a sacred value to adopt, cooperating in epistemic prisoners’ dilemmas actually seems like a relatively good choice?
I will one-box in Newcomb’s Problem even if consequentialist reasoning says otherwise
I don’t think of Newcomb’s problem as being a disagreement about consequentialism; it’s about causality. I’d mostly agree with the statement “I will one-box in Newcomb’s Problem even if causal reasoning says otherwise” (though really I would want to add more nuance).
I feel relatively confident that most decision theorists at MIRI would agree with me on this.
If I had to choose a sacred value to adopt, cooperating in epistemic prisoners’ dilemmas actually seems like a relatively good choice?
In a real prisoner’s dilemma, you get defected against if you do that. You also need to take into account how the other player reasons. (I don’t know what you mean by epistemic prisoner’s dilemmas, perhaps that distinction is important.)
I also want to note that “take the majority vote of the relevant stakeholders” seems to be very much in line with “cooperating in epistemic prisoner’s dilemmas”, so if the offer did go through, I would expect this to strengthen that particular norm. See also this comment.
my impression was sometimes it can be useful to have “sacred values” in certain decision-theoretic contexts
I would not put it this way. It depends on what future situations you expect to be in. You might want to keep honesty as a sacred value, and tell an ax-murderer where your friend is, if you think that one day you will have to convince aliens that we do not intend them harm in order to avert a huge war. Most of us don’t expect that, so we don’t keep honesty as a sacred value. Ultimately it does all boil down to consequences.
The policy of “if two people object then the plan doesn’t go through” sets up a unilateralist-curse scenario for the people against the plan—after the first person says no, every future person is now able to unilaterally stop the plan, regardless of how many people are in favor of it. (See also Scott’s comment.) Ideally we’d avoid that; majority vote of comments does so (and seems like the principled solution).
(Though at this point it’s probably moot given the existing number of nays.)
Let’s, for the hell of it, assume real money got involved. Like, it was $50M or something.
Now — who would you want to be able to vote on whether destruction happens if their values aren’t met with that amount of money at stake?
If it’s the whole internet, most people will treat it as entertainment or competition as opposed to considering what we actually care about.
But if we’re going to limit it only to people that are thoughtful, that invalidates the point of majority vote doesn’t it?
Think about it, I’m not going to write out all the implications, but I think your faith in crowdsourced voting mechanisms for things with known-short-payoff against with long-unknown-costs that destroy long-unknown-gains is perhaps misplaced...?
Most people are — factually speaking — not educated on all relevant topics, not fully numerate on statistics and payoff calculations, go with their feelings instead of analysis, and are short-term thinkers.......…
I dunno, one life seems like a pretty expensive trade for the homepage staying up for a day. I bet a potential buyer could shop around and obtain launch codes for half a life.
Not saying I’d personally give up my launch code at the very reasonable cost of $836. But someone could probably be found. Especially if the buyer somehow found a way to frame someone else for the launch.
(Of course, now this comment is sitting around in plain view of everyone, the launch codes would have to come from someone other than me, even accounting for the framing.)
I’m pretty sure it is? I had already decided on & committed to a donation amount for 2019, and this would be in addition to that. The lifesaving part is relevant insofar as I am happier about the prospect of this trade than I would be about paying the same amount to an individual.
The only way in which I could imagine this not being perfectly counterfactual is that given that discretionary spending choices depend some on my finances at any given point, and given that large purchases have some impact on my finances, it may be that if some other similar opportunity presented itself later on, my decision re: that opportunity could have some indirect causal connection to my current decision (not in the direct sense of “oh I already donated last month so I won’t now” but just in the sense of “hmm how much discretionary-spending money do I currently have and, given that, do I want to spend $X on Y”). I’m not sure it’s really ever possible to get rid of that though?
It seems extremely unfortunate that the terminology apparently shifted from “counterfactually valid” (which means the right thing) to “counterfactual” (which means almost the opposite of the right thing).
Do you have a suggestion for terminology that properly truncates? (i.e. I think it’s basically impossible to expect a long phrase to end up being the one people regularly use, so if you want to fix that issue you need a single word that does the job)
“Additional donation” seems like the obvious choice in place of “counterfactual donation”, since we just mean “additional to what you would have donated anyway”, right? (The very obviousness makes me think maybe there’s a downside to the term that I’m not seeing, or I’m confused in some other way.)
[EDIT: two people with codes below have objected, so I’m not up for this trade anymore, unless we figure out a way to make a broader poll]
I have launch codes. Would anyone be interested in offering counterfactual donations to https://www.givewell.org/charities/amf? I could also be interested in counterfactual donations to nuclear war-prevention organizations.
oh geez
“Rae, this is a friendly reminder from the universe that you can only at best control the first-order effects of systems you create...”
Since the day is drawing to a close and at this point I won’t get to do the thing I wanted to do, here are some scattered thoughts about this thing.
First, my plan upon obtaining the code was to immediately repeat Jeff’s offer. I was curious how many times we could iterate this; I had in fact found another person who was potentially interested in being another link in this chain (and who was also more interested in repeating the offer than nuking the site). I told Jeff this privately but didn’t want to post it publicly (reasons: thought it would be more fun if this was a surprise; didn’t think people should put that much weight on my claimed intentions anyway; thought it was valuable for the conversation to proceed as though nuking were the likely outcome).
(In the event that nobody took me up on the offer, I still wasn’t going to nuke the site.)
Other various thoughts:
Having talked to some people who take this exercise very seriously indeed and some who don’t understand why anyone takes it seriously at all, both perspectives make a lot of sense to me and yet I’m having trouble explaining either one to the other. Probably I should practice passing some ITTs.
Of the arguments raised against the trade the one that I am the most sympathetic to is TurnTrout’s argument that it’s actually very important to hold to the important principles even when there’s a naive utilitarian argument in favor of abandoning them. I agree very strongly with this idea.
But it also seems to me there’s a kind of… mixing levels here? The tradeoff here is between something symbolic and something very real. I think there’s a limit to the extent this is analogous to, like, “maintain a bright line against torture even when torture seems like the least bad choice”, which I think of as the canonical example of this idea.
(I realize some people made arguments that this symbolic thing is actually reflective or possibly determinative of probabilistic real consequences (in which case the “mixing levels” point above is wrong). (Possibly even the arguments that didn’t state this explicitly relied on the implication of this?) I guess I just…. don’t find that very persuasive, because, again, the extent to which this exercise is analogous to anything of real-world importance is pretty limited; the vast majority of people who would nuke LW for shits and giggles wouldn’t also nuke the world for shits and giggles. Rituals and intentional exercises like these have any power but I think I put less stock in them than some.)
Relatedly, I guess I feel like if the LW devs wanted me to take this more seriously they should’ve made it have actual stakes; having just the front page go down for just 24 hours is just not actually destroying something of real value. (I don’t mean to insult the devs or even the button project—I think this has been pretty great actually—it’s just great in more of a “this is a fun stunt/valuable discussion starter” way than a “oh shit this is a situation where trustworthiness and reliability matter” way. (I realize that doing this in a way that had stakes would have possibly been unacceptably risky; I don’t really know how to calibrate the stakes such that they both matter and are an acceptable risk.))
Nevertheless I am actually pleased that we’ve made it through (most of) the day without the site going down (even when someone posted (what they claim is) their code on Facebook).
I am more pleased than that about the discussions that have happened here. I think the discussions would have been less active and less good without a specific actual possible deal on the table, so I’m glad to have spurred a concrete proposal which I think helped pin down some discussion points that would have remained nebulous or just gone unsaid otherwise.
If in fact the probability of someone nuking the site is entangled with the probability of someone nuking the world (or similar), I think it’s much more likely that both share common causes than that one causes the other. If this is so, then gaining more information about where we stand is valuable even if it involves someone nuking the site (perhaps especially then?).
In general I think a more eventful Petrov Day is probably more valuable and informative than a less eventful one.
I would like to add that I think this is bad (and have the codes). We are trying to build social norms around not destroying the world; you are blithely defecting against that.
I’m not doing anything unilaterally. If I do anything at this point it will be after some sort of fair polling.
This seems extremely unprincipled of you :/
Clarify?
I thought you were threatening extortion. As it is, given that people are being challenged to uphold morality, this response is still an offer to throw that away in exchange for money, under the claim that it’s moral because of some distant effect. I’d encourage you to follow Jai’s example and simply delete your launch codes.
yesssss shenanigans
Are you offering to take donations in exchange for pressing the button or not pressing the button?
I would give someone my launch codes in exchange for a sufficiently large counterfactual donation.
I haven’t thought seriously about how large it would need to be, because I don’t expect someone to take me up on this, but if you’re interested we can talk.
I thought he was being ambiguous on purpose, so as to maximize donations.
I think the better version of this strategy would involve getting competing donations from both sides, using some weighting of total donations for/against pushing the button to set a probability of pressing the button, and tweaking the weighting of the donations such that you expect the probability of pressing the button will be low (because pressing the button threatens to lower the probability of future games of this kind, this is an iterated game rather than a one-shot).
Agreed. I have launch codes and will donate up to $100 without writing it in my EA budget if that prevents the nuke from being launched.
Nooooo you’re a good person but you’re promoting negotiating with terrorists literally boo negative valence emotivism to highlight third-order effects, boo, noooooo................
As they say in the KGB, one man’s nuclear terrorism is another man’s charity game show.
Participants were selected based on whether they seem unlikely to press the button, so whoever would have cared about future extortions being possible CDT-doesn’t need to, because they won’t be a part of it.
hey actually I’m potentially interested depending on what size of donation you would consider sufficient, can you give an estimate?
Maybe a fair value would be GiveWell’s best guess cost per life saved equivalent? [1] There’s some harm in releasing the codes entrusted to me, but not so much that it’s better for someone to die.
I would want your assurance that it really was a counterfactually valid donation, though: money you would otherwise spend selfishly, and that you would not consider part of your altruistic impact on the world.
If two other people with launch codes tell me they don’t think this is a good trade then I’ll retract the offer.
[1] https://www.givewell.org/how-we-work/our-criteria/cost-effectiveness/cost-effectiveness-models gives $1,672.
I have launch codes and don’t think this is good. Specifically, I think it’s bad.
Did you consider the unilateralist curse before making this comment?
Do you consider it to be a bad idea if you condition the assumption that only one other person with launch access who sees this post in the time window choose to say it was a bad idea?
Is the objection over the amount (there’s a higher number where it would be a good trade), being skeptical of the counterfactuality of the donation (would the money really be spent fully selfishly?), or something else?
(others have said part of what I wanted to say, but didn’t quite cover the thing I was worried about)
I see two potential objections:
how valuable is trust among LW users? (this is hard to quantify, but I think it is potentially quite high)
how persuasive should “it’s better than for someone to die” type arguments.
My immediate thoughts are mostly about the second argument.
I think it’s quite dangerous to leave oneself vulnerable to the second argument (for reasons Julia discusses on givinggladly.com in various posts). Yes, you can reflect upon whether every given cup of coffee is worth the dead-child-currency it took to buy it. But taken naively this is emotionally cognitively exhausting. (It also pushes people towards a kind of frugality that isn’t actually that beneficial). The strategy of “set aside a budget for charity, based on your values, and don’t feel pressure to give more after that” seems really important for living sanely while altruistic.
(I don’t have a robustly satisfying answer on how to deal with that exactly, but see this comment of mine for some more expanded thoughts of mine on this)
Now, additional counterfactual donations still seem fine to be willing to make on the fly – I’ve derived fuzzy-pleasure-joy from donating based on weird schemes on the Dank EA Memes FB group. But I think it is quite dangerous to feel pressure to donate to weird Dank EA Meme schemes based on “a life is at stake.”
A life is always at stake. I don’t think most humans can or should live this way.
But this situation isn’t like that.
I agree you don’t want to always be vulnerable to the second argument, for the reasons you give. I don’t think the appropriate response is to be so hard-set in your ways that you can’t take advantage of new opportunities that arise. You can in fact compare whether or not a particular trade is worth it if the situation calls for it, and a one-time situation that has an upside of $1672 for ~no work seems like such a situation.
As a meta point directed more at the general conversation than this comment in particular, I would really like it if people stated monetary values at which they would think this was a good idea. At $10, I’m at “obviously not”, and at $1 million, I’m at “obviously yes”. I think the range of uncertainty is something like $500 - $20,000. Currently it feels like the building of trust is being treated as a sacred value; this seems bad.
My sense is that it’s very unlikely to be worth it at anything below $10k, and I might be a bit tempted at around $50k, though still quite hesitant. I agree that at $1M it’s very likely worth it.
Firm disagree. Second-order and third-order effects go limit->infinity here.
Also btw, I’m running a startup that’s now looking at — best case scenario — handling significant amounts of money over multiple years.
It makes me realize that “a lot of money” on the individual level is a terrible heuristic. Seriously, it’s hard to get one’s mind around it, but a million dollars is decidedly not a lot of money on the global scale.
For further elaboration, this is relevant and incredibly timely:
https://slatestarcodex.com/2019/09/18/too-much-dark-money-in-almonds/
LW frontpage going down is also not particularly bad, so you don’t need much money to compensate for it.
If you wanted to convince me, you could make a case that destroying trust is really bad, and that in this particular case pressing the button would destroy a lot of trust, but that case hasn’t really been made.
Umm, respectfully, I think this is extremely arrogant. Dangerously so.
Anyways, I’m being blunt here, but I think respectful and hopefully useful. Think about this. Reasoning follows —
The instructions if you got launch codes (also in the above post) were as such (emphasis added with underline) —
“Every Petrov Day, we practice not destroying the world. One particular way to do this is to practice the virtue of not taking unilateralist action.
It’s difficult to know who can be trusted, but today I have selected a group of LessWrong users who I think I can rely on in this way. You’ve all been given the opportunity to show yourselves capable and trustworthy.
[...]
This Petrov Day, between midnight and midnight PST, if you, {{username}}, enter the launch codes below on LessWrong, the Frontpage will go down for 24 hours.
I hope to see you on the other side of this, with our honor intact.”
So, to Ben Pace at least (the developer who put in a tremendous amount of hours and thought into putting this together), it represents...
*”practicing not destroying the world”
*”practicing the virtue of not taking unilateralist action”
*implications around his own uncertainty of who to trust
*de facto for Ben that he can’t rely on you personally, by his standards, if you do it
*showing yourself not “capable and trustworthy” by his standards
*having the total group’s “honor” “not be intact”, under Ben’s conception
And you want me to make a case for you on a single variable while ignoring the rather clear and straightforward written instructions for your own simple reductive understanding?
For Ben at least, the button thing was a symbolic exercise analogous to not nuking another country and he specifically asked you not to and said he’s trusting you.
So, no, I don’t want to “convince you” nor “make a case that destroying trust is really bad.” You’re literally stating you should set the burden of proof and others should “make a case.”
In an earlier comment you wrote,
“No work”? You mean aside from the work that Ben and the team did (a lot) and demonstrating to the world at large that the rationality community can’t press a “don’t destroy our own website” button to celebrate a Soviet soldier who chose restraint?
I mean, I don’t even want to put numbers on it, but if we gotta go to “least common denominator”, then $1672 is less than a week’s salary of the median developer in San Francisco. You’d be doing a hell of a lot more damage than that to morale and goodwill, I reckon, among the dev team here.
To be frank, I think the second-order and third-order effects of this project going well on Ben Pace alone is worth more than $1672 in “generative goodness” or whatever, and the potential disappointment and loss of faith in people he “thinks but is uncertain he can rely upon and trust” is… I mean, you know that one highly motivated person leading a community can make an immense difference right?
Just so you can get $1672 for charity (“upside”) with “~no work”?
And that’s just productivity, ignoring any potential negative affect or psychological distress, and being forced to reevaluate who he can trust. I mean, to pick a more taboo example, how many really nasty personal insults would you shout at a random software developer for $1672 to charity? That’s almost “no work” — it’s just you shouting some words, and whatever trivial psychological distress they feel, and I wager getting random insults from a stranger is much lower than having people you “are relying on and trusting” press a “don’t nuke the world simulator button.”
Like, if you just read what Ben wrote, you’d realize that risking destroying goodwill and faith in a single motivated innovative person alone should be priced well over $20k. I wouldn’t have done it for $100M going to charity. Seriously.
If you think that’s insane, stop and think why our numbers are four orders of magnitude apart — our priors must be obviously very different. And based on the comments, I’m taking into account more things than you, so you might be missing something really important.
(I could go on forever about this, but here’s one more: what’s the difference in your expected number of people discovering and getting into basic rationality, cognitive biases, and statistics with pressing the “failed at ‘not destroying the world day’ commemoration” vs not? Mine: high. What’s the value of more people thinking and acting rationally? Mine: high. So multiply the delta by the value. That’s just one more thing. There’s a lot you’re missing. I don’t mean this disrespectfully, but maybe think more instead of “doing you” on a quick timetable?)
(Here’s another one you didn’t think about: we’re celebrating a Soviet engineer. Run this headline in a Russian newspaper: “Americans try to celebrate Stanislav Petrov by not pressing ‘nuke their own website’ button, arrogant American pushes button because money isn’t donated to charity.”)
(Here’s another one you didn’t think about: I’ll give anyone 10:1 odds this is cited in a mainstream political science journal within 15 years, which are read by people who both set and advise on policy, and that “group of mostly American and European rationalists couldn’t not nuke their own site” absolutely is the type of thing to shape policy discussions ever-so-slightly.)
(Here’s another one you didn’t think about: some fraction of the people here are active-duty or reserve military in various countries. How does this going one way or another shape their kill/no-kill decisions in ambiguous warzones? Have you ever read any military memoirs about people who made to make those calls quickly, EX overwatch snipers in Mogadishu? No?)
(Not meant to be snarky — Please think more and trust your own intuition less.)
Thanks for writing this up. It’s pretty clear to me that you aren’t modeling me particularly well, and that it would take a very long time to resolve this, which I’m not particularly willing to do right now.
I’ll take that bet. Here’s a proposal: I send you $100 today, and in 15 years if you can’t show me an article in a reputable mainstream political science journal that mentions this event, then you send me an inflation-adjusted $1000. This is conditional on finding an arbiter I trust (perhaps Ben) who will:
Adjudicate whether it is an “article in a reputable mainstream political science journal that mentions this event”
Compute the inflation-adjusted amount, should that be necessary
Vouch that you are trustworthy and will in fact pay in 15 years if I win the bet.
This basically seems right to me.
Which part of the two statements? That destroying trust is really bad, or that the case hasn’t been made?
That this particular case would destroy a lot of trust.
This seemed to me like a fun game with stakes of social disapproval on one side, and basically no stakes on the other. This doesn’t seem like it has much bearing on the trustworthiness of members of the rationality community in situations with real stakes, where there is a stronger temptation to defect, or it would have more of a cost on the community.
I guess implicit to what I’m saying is that the front page being down for 24 hours doesn’t seem that bad to me. I don’t come to Less Wrong most days anyway.
But this is not a one-time situation. If you’re a professional musician, would you agree to mess up at every dress rehearsal, because it isn’t the real show?
More indirectly… the whole point of “celebrating and practicing our ability to not push buttons” is that we need to be able to not push buttons, even when it seems like a good idea (or necessary, or urgent that we defect while we can still salvage the the percieved situation). The vast majority of people aren’t tempted by pushing a button when pushing it seems like an obviously bad idea. I think we need to take trust building seriously, and practice the art of actually cooperating. Real life doesn’t grade you on how well you understand TDT considerations and how many blog posts you’ve read on it, it grades you on whether you actually can make the cooperation equilibrium happen.
Rohin argues elsewhere for taking a vote (at least in principal). If 50% vote in favor, then he has successfully avoided “falling into the unilateralist’s curse” and has gotten $1.6k for AMF. He even has some bonus for “solved the unilateralist’s curse in a way that’s not just “sit on his hands”. Now, it’s probably worth subtracting points for “the LW team asked them not to blow up the site and the community decided to anyway.” But I’d consider it fair play.
Depends on the upside.
This comment of mine was meant to address the claim “people shouldn’t be too easily persuaded by arguments about people dying” (the second claim in Raemon’s comment above). I agree that intuitions like this should push up the size of the donation you require.
As jp mentioned, I think the ideal thing to do is: first, each person figures out whether they personally think the plan is positive / negative, and then go with the majority opinion. I’m talking about the first step here. The second step is the part where you deal with the unilateralist curse.
It seems to me like the algorithm people are following is: if an action would be unilateralist, and there could be disagreement about its benefit, don’t take the action. This will systematically bias the group towards inaction. While this is fine for low-stakes situations, in higher-stakes situations where the group can invest effort, you should actually figure out whether it is good to take the action (via the two-step method above). We need to be able to take irreversible actions; the skill we should be practicing is not “don’t take unilateralist actions”, it’s “take unilateralist actions only if they have an expected positive effect after taking the unilateralist curse into account”.
We never have certainty, not for anything in this world. We must act anyway, and deciding not to act is also a choice. (Source)
I don’t disagree with this, and am glad to see reminders to actually evaluate different courses of action besides the one expected of us. my comment was more debating your own valuation as being too low, it not being a one-off event once you consider scenarios either logically or causally downstream of this one, and just a general sense that you view the consequences of this event as quite isolated.
That makes sense. I don’t think I’m treating it as a one-off event; it’s more that it doesn’t really seem like there’s much damage to the norm. If a majority of people thought it was better to take the counterfactual donation, it seems like the lesson is “wow, we in fact can coordinate to make good decisions”, as opposed to “whoops, it turns out rationalists can’t even coordinate on not nuking their own site”.
jkaufman’s initial offer was unclear. I read it (incorrectly) as “I will push the button (/release the codes) unless someone gives AMF $1672 counterfactually”, not as “if someone is willing to pay me $1672, I will give them the codes”. Read in the first way, Raemon’s concerns about “pressure” as opposed to additional donations made on the fly may be clearer; it’s not about jkaufman’s opportunity to get $1672 in donations for no work, it’s about everyone else being extorted for an extra $1672 to preserve their values.
Perhaps a nitpick, but I feel like the building of trust is being treated less as a sacred value, and more as a quantity of unknown magnitude, with some probability that that magnitude could be really high (at least >$1672, possibly orders of magnitude higher). Doing a Fermi is a trivial inconvenience that I for one cannot handle right now; since it is a weekday, maybe others feel much the same.
I agree that your comment takes this (very reasonable) perspective. It didn’t seem to me like any other comment was taking this perspective, but perhaps that was their underlying model.
I wouldn’t do it for $100M.
Seriously.
Because it increases the marginal chance that humanity goes extinct ever-so-slightly.
If you have launch codes, wait until tomorrow to read the last part eh? —
(V zrna, hayrff lbh guvax gur rkcrevzrag snvyvat frpergyl cebzbgrf pnhgvba naq qrfgeblf bcgvzvfz, juvpu zvtug or gehr.)
Why couldn’t you use the $100M to fund x-risk prevention efforts?
Well, why stop there?
World GDP is $80.6 trillion.
Why doesn’t the United States threaten to nuke everyone if they don’t give a very reasonable 20% of their GDP per year to fund X-Risk — or whatever your favorite worthwhile projects are?
Screw it, why don’t we set the bar at 1%?
Imagine you’re advising the U.S. President (it’s Donald Trump right now, incidentally). Who should President Trump threaten with nuking if they don’t pay up to fund X-Risk? How much?
Now, let’s say 193 countries do it, and $X trillion is coming in and doing massive good.
Only Switzerland and North Korea defect. What do you do? Or rather, what do you advise Donald Trump to do?
I never suggested threats, and in fact I don’t think you should threaten to press the button unless someone makes a counterfactual donation of $1,672.
Jeff’s original comment was also not supposed to be a threat, though it was ambiguous. All of my comments are talking about the non-threat version.
Dank EA Memes? What? Really? How do I get in on this?
(Serious.)
(I shouldn’t joke “I have launch codes” — that’s grossly irresponsible for a cheap laugh — but umm, I just meta made the joke.)
Note to self: Does lighthearted dark humor highlighting risk increase or decrease chances of bad things happening?
Initial speculation: it might have an inverted response curve. One or two people making the joke might increase gravity, everyone joking about it might change norms and salience.
I noticed after playing a bunch of games of a mafia-type game with some rationalists that when people made edgy jokes about being in the mob or whatever, they were more likely to end up actually being in the mob.
There’s rationalists who are in the mafia?
Whoa.
No insightful comment, just, like — this Petrov thread is the gift that keeps on giving.
Can’t tell if joking, but they probably mean that they were “actually in the mafia” in the game, so not in the real-world mafia.
Yes, lol :)
Dank EA Memes is a Facebook group. It’s pretty good.
(I have launch codes and am happy to prove it to you if you want.)
Hmmm, I feel like the argument “There’s some harm in releasing the codes entrusted to me, but not so much that it’s better for someone to die” might prove too much? Like, death is really bad, I definitely grant that. But despite the dollar amount you gave, I feel like we’re sort of running up against a sacred value thing. I mean, you could just as easily say, “There’s some harm in releasing the codes entrusted to me, but not so much that it’s better for someone to have a 10% chance of dying”—which would naïvely bring your price down to $167.20.
If you accept as true that that argument should be equally ‘morally convincing’, then you end up in a position where the only reasonable thing to do is to calculate exactly how much harm you actually expect to be done by you pressing the button. I’m not going to do this because I’m at work and it seems complicated (what is the disvalue of harm to the social fabric of an online community that’s trying to save the world, and operates largely on trust? perhaps it’s actually a harmless game, but perhaps it’s not, hard to know—seems like the majority of effects would happen down the line).
Additionally, I could just counter-offer a $1,672 counterfactual donation to GiveWell for you to not press the button. I’m not committing to do this, but I might do so if it came down to it.
Are you telling me you don’t think this is a good trade?
Wasn’t totally sure when I wrote it, but now firmly yes.
This whole thread is awesome. This is the maybe the best thing that’s happened on LessWrong since Eliezer more-or-less went on hiatus.
Huge respect to everyone. This is really great. Hard but great. Actually it’s great because it’s hard.
I’m leaning towards this not being a good trade, even though it’s taxing to type that.
In the future, some people will find themselves in situations not too unlike this, where there are compelling utilitarian reasons for pressing the button.
And I think your policy should still be to not press the button to launch a singleton from this epistemic state, because we have to be able to cooperate! You don’t press buttons at will, under pressure, when the entire future hangs in the balance! If we can’t even cooperate, right here, right now, under much weaker pressures, what do we expect of the “untrustworthy actors”?
So how about people instead donate to charity in celebration of not pressing the button?
ETA I have launch codes btw.
Oh: and to give those potential other people time to object, I won’t accept an offer before 2hr from when I posted the parent comment (4:30 Boston time)
The normal way to resolve unilateralist curse effects is to see how many people agree / disagree, and go with the majority. (Even if the action is irreversible, as long as everyone knows that and has taken that into account, going with the majority seems fine.)
Pro: it saves an expected life. Con: LW frontpage probably goes down for a day. Con: It causes some harm to trust. Pro: It reinforces the norm of actually considering consequences, and not holding any value too sacred.
Overall I lean towards the benefits outweighing the costs, so I support this offer.
ETA: I also have codes.
Not an expert here, but my impression was sometimes it can be useful to have “sacred values” in certain decision-theoretic contexts (like “I will one-box in Newcomb’s Problem even if consequentialist reasoning says otherwise”?) If I had to choose a sacred value to adopt, cooperating in epistemic prisoners’ dilemmas actually seems like a relatively good choice?
I don’t think of Newcomb’s problem as being a disagreement about consequentialism; it’s about causality. I’d mostly agree with the statement “I will one-box in Newcomb’s Problem even if causal reasoning says otherwise” (though really I would want to add more nuance).
I feel relatively confident that most decision theorists at MIRI would agree with me on this.
In a real prisoner’s dilemma, you get defected against if you do that. You also need to take into account how the other player reasons. (I don’t know what you mean by epistemic prisoner’s dilemmas, perhaps that distinction is important.)
I also want to note that “take the majority vote of the relevant stakeholders” seems to be very much in line with “cooperating in epistemic prisoner’s dilemmas”, so if the offer did go through, I would expect this to strengthen that particular norm. See also this comment.
I would not put it this way. It depends on what future situations you expect to be in. You might want to keep honesty as a sacred value, and tell an ax-murderer where your friend is, if you think that one day you will have to convince aliens that we do not intend them harm in order to avert a huge war. Most of us don’t expect that, so we don’t keep honesty as a sacred value. Ultimately it does all boil down to consequences.
If we could figure out some reasonable way to poll people I agree, but I don’t see a good way to do that, especially not on this timescale?
Presumably you could take the majority vote of comments left in a 2 hour span?
^ Yeah, that.
The policy of “if two people object then the plan doesn’t go through” sets up a unilateralist-curse scenario for the people against the plan—after the first person says no, every future person is now able to unilaterally stop the plan, regardless of how many people are in favor of it. (See also Scott’s comment.) Ideally we’d avoid that; majority vote of comments does so (and seems like the principled solution).
(Though at this point it’s probably moot given the existing number of nays.)
Let’s, for the hell of it, assume real money got involved. Like, it was $50M or something.
Now — who would you want to be able to vote on whether destruction happens if their values aren’t met with that amount of money at stake?
If it’s the whole internet, most people will treat it as entertainment or competition as opposed to considering what we actually care about.
But if we’re going to limit it only to people that are thoughtful, that invalidates the point of majority vote doesn’t it?
Think about it, I’m not going to write out all the implications, but I think your faith in crowdsourced voting mechanisms for things with known-short-payoff against with long-unknown-costs that destroy long-unknown-gains is perhaps misplaced...?
Most people are — factually speaking — not educated on all relevant topics, not fully numerate on statistics and payoff calculations, go with their feelings instead of analysis, and are short-term thinkers.......…
I agree that in general this is a problem, but I think in this particular case we have the obvious choice of the set of all people with launch codes.
(Btw, your counterargument also applies to the unilateralist curse itself.)
I’m surprised that LW being down for a day isn’t on your list of cons. [ETA: or rather the LW home page]
It could also be on the list of pros, depending on how one uses LW.
I feel obligated to note that it will in fact only destroy the frontpage of LW, not the rest of the site.
Ah. I thought it was the entire site. (Though it did say “Frontpage” in the post.)
Good point, added, doesn’t change the conclusion.
I’ll note that giving someone the launch codes merely increases the chance of the homepage going down.
I dunno, one life seems like a pretty expensive trade for the homepage staying up for a day. I bet a potential buyer could shop around and obtain launch codes for half a life.
Not saying I’d personally give up my launch code at the very reasonable cost of $836. But someone could probably be found. Especially if the buyer somehow found a way to frame someone else for the launch.
(Of course, now this comment is sitting around in plain view of everyone, the launch codes would have to come from someone other than me, even accounting for the framing.)
this makes sense. I shall consider whether it makes sense for me to impulse-spend this amount of money on shenanigans (and lifesaving)
If you’re considering it as spending on lifesaving then it doesn’t sound counterfactual?
I’m pretty sure it is? I had already decided on & committed to a donation amount for 2019, and this would be in addition to that. The lifesaving part is relevant insofar as I am happier about the prospect of this trade than I would be about paying the same amount to an individual.
The only way in which I could imagine this not being perfectly counterfactual is that given that discretionary spending choices depend some on my finances at any given point, and given that large purchases have some impact on my finances, it may be that if some other similar opportunity presented itself later on, my decision re: that opportunity could have some indirect causal connection to my current decision (not in the direct sense of “oh I already donated last month so I won’t now” but just in the sense of “hmm how much discretionary-spending money do I currently have and, given that, do I want to spend $X on Y”). I’m not sure it’s really ever possible to get rid of that though?
It could partially motivated by lifesaving but they wouldn’t have donated otherwise. Like, not if they’re a perfectly rational agent, but hey.
If someone else with codes wants to make this offer now that Jeff has withdrawn his, I’m now confident I am up for this.
I preemptively counter-offer whatever amount of money tcheasdfjkl would pay in order for this hypothetical person not to press the button.
To be clear I am NOT looking for people to press the button, I am looking for people to give me launch codes.
Oh wow, I did not realize how ambiguous the original wording was.
Forgive me if I’m being dense, but just what in the world is a “counterfactual donation”?
Jeff does conveniently have a blogpost on this: https://www.jefftk.com/p/what-should-counterfactual-donation-mean
It seems extremely unfortunate that the terminology apparently shifted from “counterfactually valid” (which means the right thing) to “counterfactual” (which means almost the opposite of the right thing).
Do you have a suggestion for terminology that properly truncates? (i.e. I think it’s basically impossible to expect a long phrase to end up being the one people regularly use, so if you want to fix that issue you need a single word that does the job)
“Additional donation” seems like the obvious choice in place of “counterfactual donation”, since we just mean “additional to what you would have donated anyway”, right? (The very obviousness makes me think maybe there’s a downside to the term that I’m not seeing, or I’m confused in some other way.)
Sounds pragmatically weird in the case where the person isn’t known to already be donating.