I think that’s a legit concern. One mitigating factor is that people who seem inclined to rash destructive plans tend to be pretty bad at execution, e.g. Aum Shinrikyo
Jeffrey Ladish
Information security considerations for AI and the long term future
Recently Eliezer has used the dying with dignity frame a lot outside his April 1st day post. So while some parts of that post may have been a joke, the dying with dignity part was not. For example: https://docs.google.com/document/d/11AY2jUu7X2wJj8cqdA_Ri78y2MU5LS0dT5QrhO2jhzQ/edit?usp=drivesdk
If you have specific examples where you think I took something too seriously that was meant to be a joke, I’d be curious to see those.
-
Recently Eliezer has used the dying with dignity frame a lot outside his April 1st day post. So while some parts of that post may have been a joke, the dying with dignity part was not. For example: https://docs.google.com/document/d/11AY2jUu7X2wJj8cqdA_Ri78y2MU5LS0dT5QrhO2jhzQ/edit?usp=drivesdk
-
I think you’re right that dying with dignity is a better frame specifically for recommending against doing unethical stuff. I agree with everything he said about not doing unethical stuff, and tried to point to that (maybe if I have time I will add some more emphasis here).
But that being said, I feel a little frustrated that people think that caveats about not doing unethical stuff are expected in a post like this. It feels similar to if I was writing a post about standing up for yourself and had to add “stand up to bullies—but remember not to murder anyone”. Yes you should not murder bullies. But I wish to live in a world where we don’t have to caveat with that every time. I recognize that we might not live in such a world. Maybe if someone proposes “play to your outs”, people jump to violent plans without realizing how likely that is to be counterproductive to the goal. And this does seem to be somewhat true, though I’m not sure the extent of it. And I find this frustrating. That which is already true… of course, but I wish people would be a little better here.
-
Just a note on confidence, which seems especially important since I’m making a kind of normative claim:
I’m very confident “dying with dignity” is a counterproductive frame for me. I’m somewhat confident that “playing to your outs” is a really useful frame for me and people like me. I’m not very confident “playing to your outs” is a good replacement to “dying with dignity” in general, because I don’t know how much people will respond to it like I do. Seeing people’s comments here is helpful.
“It also seems to encourage #3 (and again the vague admonishment to “not do that” doesn’t seem that reassuring to me.)”
I just pointed to Eleizer’s warning which I thought was sufficient. I could write more about why I think it’s not a good idea, but I currently think a bigger portion of the problem is people not trying to come up with good plans rather than people coming up with dangerous plans which is why my emphasis is where it is.
Eliezer is great at red teaming people’s plans. This is great for finding ways plans don’t work, and I think it’s very important he keep doing this. It’s not great for motivating people to come up with good plans, though. And I think that shortage of motivation is a real threat to our chances to mitigate AI existential risk. I was talking to a leading alignment researcher yesterday who said their motivation had taken a hit from Eliezer’s constant “all your plans will fail” talk, so I’m pretty sure this is a real thing, even though I’m unsure of the magnitude.
I currently don’t know of any outs. But I think I know some things that outs might require and am working on those, while hoping someone comes up with some good outs—and occasionally taking a stab at them myself.
I think the main problem is the first point and not the second point:
Do NOT assume that what you think is an out is certainly an out.
Do NOT assume that the potential outs you’re aware of are a significant proportion of all outs.
The current problem, if Eleizer is right, is basically that we have 0 outs. Not that the ones we have might be less promising than other ones. And he’s criticising people for not thinking their plans are outs when they’re actually not.
Well, I think that’s a real problem, but I worry Eliezer’s frame will generally discourage people from even trying to come up with good plans at all. That’s why I emphasize outs.
I agree finding your outs is very hard, but I don’t think this is actually a different challenge than increasing “dignity”. If you don’t have a map to victory, then you probably lose. I expect that in most worlds where we win, some people figured out some outs and played to them.
Don’t die with dignity; instead play to your outs
I donated:
$100 to Zvi Mowshowitz for his post “Covid-19: My Current Model” but really for all his posts. I appreciated how Zvi kept posting Covid updates long after I have energy to do my own research on this topic. I also appreciate how he called the Omicron wave pretty well.$100 to Duncan Sabien for his post “CFAR Participant Handbook now available to all”. I’m glad CFAR decided to make it public, both because I have been curious for a while what was in it and because in general I think it’s pretty good practice for orgs like CFAR to publish more of what they do. So thanks for doing that!
I’ve edited the original to add “some” so it reads “I’m confident that some nuclear war planners have...”
It wouldn’t surprise me if some nuclear war planners had dismissed these risks while others had thought them important.
I’m fairly confident that at least some nuclear war planners have thought deeply about the risks of climate change from nuclear war because I’ve talked to a researcher at RAND who basically told me as much, plus the group at Los Alamos who published papers about it, both of which seem like strong evidence that some nuclear war planners have taken it seriously. Reisner et al., “Climate Impact of a Regional Nuclear Weapons Exchange: An Improved Assessment Based On Detailed Source Calculations” is mostly Los Alamos scientists I believe.
Just because some of these researchers & nuclear war planners have thought deeply about it doesn’t mean nuclear policy will end up being sane and factoring in the risks. But I think it provides some evidence in that direction.
Thanks this is helpful! I’d be very curious to see where Paul agreed / disagree with the summary / implications of his view here.
After reading these two Eliezer <> Paul discussions, I realize I’m confused about what the importance of their disagreement is.
It’s very clear to me why Richard & Eliezer’s disagreement is important. Alignment being extremely hard suggests AI companies should work a lot harder to avoid accidentally destroying the world, and suggests alignment researchers should be wary of easy-seeming alignment approaches.
But it seems like Paul & Eliezer basically agree about all of that. They disagree about… what the world looks like shortly before the end? Which, sure, does have some strategic implications. You might be able to make a ton of money by betting on AI companies and thus have a lot of power in the few years before the world drastically changes. That does seem important, but it doesn’t seem nearly as important as the difficulty of alignment.
I wonder if there are other things Paul & Eliezer disagree about that are more important. Or if I’m underrating the importance of the ways they disagree here. Paul wants Eliezer to bet on things so Paul can have a chance to update to his view in the future if things end up being really different than he thinks. Okay, but what will he do differently in those worlds? Imo he’d just be doing the same things he’s trying now if Eliezer was right. And maybe there is something implicit in Paul’s “smooth line” forecasting beliefs that makes his prosaic alignment strategy more likely to work in world’s where he’s right, but I currently don’t see it.
Another way to run this would be to have a period of time before launches are possible for people to negotiate, and then to not allow retracting nukes after that point. And I think next time I would make it so that the total of no-nukes would be greater than the total if only one side nuked, though I did like this time that people had the option of a creative solution that “nuked” a side but lead to higher EV for both parties than not nuking.
I think the fungibility is a good point, but it seems like the randomizer solution is strictly better than this. Otherwise one side clearly gets less value, even if they are better off than they would have been had the game not happened. It’s still a mixed motive conflict!
I’m not sure that anyone exercised restraint in not responding to the last attack, as I don’t have any evidence that anyone saw the last response. It’s quite possible people did see it and didn’t respond, but I have no way to know that.
Oh I should have specified, that I would consider the coin flip to be a cooperative solution! Seems obviously better to me than any other solution.
I think there are a lot of dynamics present here that aren’t present in the classic prisoners dilemma, and some dynamics that are present (and some that are present in various iterated prisoner’s dilemmas). The prize might be different for different actors, since actors place different value of “cooperative” outcomes. If you can trust people’s precommitments, I think there is a race to commit OR precommit to an action.
E.g. if I wanted the game to settle with no nukes launched, then I could pre-commit to launching a retaliatory strike to either side if an attack was launched.
This could mitigate financial risk to the company but I don’t think anyone will sell existential risk insurance, or that it would be effective if they did