I’m confused about A6, from which I get “Yudkowsky is aiming for a pivotal act to prevent the formation of unaligned AGI that’s outside the Overton Window and on the order of burning all GPUs”. This seems counter to the notion in Q4 of Death with Dignity where Yudkowsky says
It’s relatively safe to be around an Eliezer Yudkowsky while the world is ending, because he’s not going to do anything extreme and unethical unless it would really actually save the world in real life, and there are no extreme unethical actions that would really actually save the world the way these things play out in real life, and he knows that. He knows that the next stupid sacrifice-of-ethics proposed won’t work to save the world either, actually in real life.
I would estimate that burning all AGI-capable compute would disrupt every factor of the global economy for years and cause tens of millions of deaths[1], and that’s what Yudkowsky considers the more mentionable example. Do the other options outside the Overton Window somehow not qualify as unsafe/extreme unethical actions (by the standards of the audience of Death with Dignity)? Has Yudkowsky changed his mind on what options would actually save the world? Does Yudkowsky think that the chances of finding a pivotal act that would significantly delay unsafe AGI are so slim that he’s safe to be around despite him being unsafe in the hypothetical that such a pivotal act is achievable? I’m confused.
Also, I’m not sure how much overlap there is between people who do Bayesian updates and people for who whatever Yudkowsky is thinking of is outside the Overton Window, but in general, if someone says that what they actually want is outside your Overton Window, I see only two directions to update in: either shift your Overton Window to include their intent, or shift your opinion of them to outside your Overton Window. If the first option isn’t going to happen, as Yudkowsky says (for public discussion on lesswrong at least), that leaves the second.
Compare modern estimates of the damage that would be caused by a solar flare equivalent to the Carrington Event. Factories, food supply, long-distance communication, digital currency—many critical services nowadays are dependent on compute, and that portion will only increase by the time you would actually pull the trigger.
Interventions on the order of burning all GPUs in clusters larger than 4 and preventing any new clusters from being made, including the reaction of existing political entities to that event and the many interest groups who would try to shut you down and build new GPU factories or clusters hidden from the means you’d used to burn them, would in fact really actually save the world for an extended period of time and imply a drastically different gameboard offering new hopes and options.
What makes me safe to be around is that I know that various forms of angrily acting out violently would not, in fact, accomplish anything like this. I would only do something hugely awful that would actually save the world. No such option will be on the table, and I, the original person who wasn’t an idiot optimist, will not overestimate and pretend that something will save the world when it obviously-to-me won’t. So I’m a relatively safe person to be around, because I am not the cartoon supervillain talking about necessary sacrifices to achieve greater goods when everybody in the audience knows that the greater good won’t be achieved; I am the person in the audience rolling their eyes at the cartoon supervillain.
Interventions on the order of burning all GPUs in clusters larger than 4 and preventing any new clusters from being made, including the reaction of existing political entities to that event and the many interest groups who would try to shut you down and build new GPU factories or clusters hidden from the means you’d used to burn them, would in fact really actually save the world for an extended period of time and imply a drastically different gameboard offering new hopes and options.
I suppose ‘on the order of’ is the operative phrase here, but that specific scenario seems like it would be extremely difficult to specify an AGI for without disastrous side-effects and like it still wouldn’t be enough. Other, less efficient or less well developed forms of compute exist, and preventing humans from organizing to find a way around the GPU-burner’s blacklist for unaligned AGI research while differentially allowing them to find a way to build friendly AGI seems like it would require a lot of psychological/political finesse on the GPU-burner’s part. It’s on the level of Ozymandias from Watchmen, but it’s cartoonish supervillainy nontheless.
I guess my main issue is a matter of trust. You can say the right words, as all the best supervillains do, promising that the appropriate cautions are taken above our clearance level. You’ve pointed out plenty of mistakes you could be making, and the ease with which one can make mistakes in situations such as yours, but acknowledging potential errors doesn’t prevent you from making them. I don’t expect you to have many people you would trust with AGI, and I expect that circle would shrink further if those people said they would use the AGI to do awful things iff it would actually save the world [in their best judgment]. I currently have no-one in the second circle.
If you’ve got a better procedure for people to learn to trust you, go ahead, but is there something like an audit you’ve participated in/would be willing to participate in? Any references regarding your upstanding moral reasoning in high-stakes situations that have been resolved? Checks and balances in case of your hardware being corrupted?
You may be the audience member rolling their eyes at the cartoon supervillain, but I want to be the audience member rolling their eyes at HJPEV when he has a conversation with Quirrel where he doesn’t realise that Quirrel is evil.
It definitely is the case that a pivotal act that isn’t “disruptive” isn’t a pivotal act. But I think not all disruptive acts have a significant cost in human lives.
To continue with the ‘burn all GPUs’ example, note that while some industries are heavily dependent on GPUs, most industries are instead heavily dependent on CPUs. The hospital’s power will still be on if all GPUs melt, and probably their monitors will still work (if the nanobots can somehow distinguish between standalone GPUs and ones embedded into motherboards). Transportation networks will probably still function, and so on. Cryptocurrencies, entertainment industries, and lots of AI applications will be significantly impacted, but this seems recoverable.
But I do think Eliezer’s main claim is: some people will lash out in desperation when cornered (“Well, maybe starting WWIII will help with AI risk!”), and Eliezer is not one of those people. So if he makes a call of the form “disruption that causes 10M deaths”, it’s because the other option looked actually worse, and so this is ‘safer’. [If you’re one of the people tied up on the trolley tracks, you want the person at the lever to switch it!]
AI can run on CPUs (with a certain inefficiency factor), so only burning all GPUs doesn’t seem like it would be sufficient. As for disruptive acts that are less deadly, it would be nice to have some examples but Eliezer says they’re too far out of the Overton Window to mention.
If what you’re saying about Eliezer’s claim is accurate, it does seem disingenuous to frame “The only worlds where humanity survives are ones where people like me do something extreme and unethical” as “I won’t do anything extreme and unethical [because humanity is doomed anyway]”. It makes Eliezer dangerous to be around if he’s mistaken, and if you’re significantly less pessimistic than he is (if you assign >10^-6 probability to humanity surviving), he’s mistaken in most of the worlds where humanity survives. Which are the worlds that matter the most.
And yeah, it’s nice that Eliezer claims that Eliezer can violate ethical injunctions because he’s smart enough, after repeatedly stating that people who violate ethical injunctions because they think they’re smart enough are almost always wrong. I don’t doubt he’ll pick the option that looks actually better to him. It’s just that he’s only human—he’s running on corrupted hardware like the rest of us.
I’m confused about A6, from which I get “Yudkowsky is aiming for a pivotal act to prevent the formation of unaligned AGI that’s outside the Overton Window and on the order of burning all GPUs”. This seems counter to the notion in Q4 of Death with Dignity where Yudkowsky says
I would estimate that burning all AGI-capable compute would disrupt every factor of the global economy for years and cause tens of millions of deaths[1], and that’s what Yudkowsky considers the more mentionable example. Do the other options outside the Overton Window somehow not qualify as unsafe/extreme unethical actions (by the standards of the audience of Death with Dignity)? Has Yudkowsky changed his mind on what options would actually save the world? Does Yudkowsky think that the chances of finding a pivotal act that would significantly delay unsafe AGI are so slim that he’s safe to be around despite him being unsafe in the hypothetical that such a pivotal act is achievable? I’m confused.
Also, I’m not sure how much overlap there is between people who do Bayesian updates and people for who whatever Yudkowsky is thinking of is outside the Overton Window, but in general, if someone says that what they actually want is outside your Overton Window, I see only two directions to update in: either shift your Overton Window to include their intent, or shift your opinion of them to outside your Overton Window. If the first option isn’t going to happen, as Yudkowsky says (for public discussion on lesswrong at least), that leaves the second.
Compare modern estimates of the damage that would be caused by a solar flare equivalent to the Carrington Event. Factories, food supply, long-distance communication, digital currency—many critical services nowadays are dependent on compute, and that portion will only increase by the time you would actually pull the trigger.
Interventions on the order of burning all GPUs in clusters larger than 4 and preventing any new clusters from being made, including the reaction of existing political entities to that event and the many interest groups who would try to shut you down and build new GPU factories or clusters hidden from the means you’d used to burn them, would in fact really actually save the world for an extended period of time and imply a drastically different gameboard offering new hopes and options.
What makes me safe to be around is that I know that various forms of angrily acting out violently would not, in fact, accomplish anything like this. I would only do something hugely awful that would actually save the world. No such option will be on the table, and I, the original person who wasn’t an idiot optimist, will not overestimate and pretend that something will save the world when it obviously-to-me won’t. So I’m a relatively safe person to be around, because I am not the cartoon supervillain talking about necessary sacrifices to achieve greater goods when everybody in the audience knows that the greater good won’t be achieved; I am the person in the audience rolling their eyes at the cartoon supervillain.
I suppose ‘on the order of’ is the operative phrase here, but that specific scenario seems like it would be extremely difficult to specify an AGI for without disastrous side-effects and like it still wouldn’t be enough. Other, less efficient or less well developed forms of compute exist, and preventing humans from organizing to find a way around the GPU-burner’s blacklist for unaligned AGI research while differentially allowing them to find a way to build friendly AGI seems like it would require a lot of psychological/political finesse on the GPU-burner’s part. It’s on the level of Ozymandias from Watchmen, but it’s cartoonish supervillainy nontheless.
I guess my main issue is a matter of trust. You can say the right words, as all the best supervillains do, promising that the appropriate cautions are taken above our clearance level. You’ve pointed out plenty of mistakes you could be making, and the ease with which one can make mistakes in situations such as yours, but acknowledging potential errors doesn’t prevent you from making them. I don’t expect you to have many people you would trust with AGI, and I expect that circle would shrink further if those people said they would use the AGI to do awful things iff it would actually save the world [in their best judgment]. I currently have no-one in the second circle.
If you’ve got a better procedure for people to learn to trust you, go ahead, but is there something like an audit you’ve participated in/would be willing to participate in? Any references regarding your upstanding moral reasoning in high-stakes situations that have been resolved? Checks and balances in case of your hardware being corrupted?
You may be the audience member rolling their eyes at the cartoon supervillain, but I want to be the audience member rolling their eyes at HJPEV when he has a conversation with Quirrel where he doesn’t realise that Quirrel is evil.
It definitely is the case that a pivotal act that isn’t “disruptive” isn’t a pivotal act. But I think not all disruptive acts have a significant cost in human lives.
To continue with the ‘burn all GPUs’ example, note that while some industries are heavily dependent on GPUs, most industries are instead heavily dependent on CPUs. The hospital’s power will still be on if all GPUs melt, and probably their monitors will still work (if the nanobots can somehow distinguish between standalone GPUs and ones embedded into motherboards). Transportation networks will probably still function, and so on. Cryptocurrencies, entertainment industries, and lots of AI applications will be significantly impacted, but this seems recoverable.
But I do think Eliezer’s main claim is: some people will lash out in desperation when cornered (“Well, maybe starting WWIII will help with AI risk!”), and Eliezer is not one of those people. So if he makes a call of the form “disruption that causes 10M deaths”, it’s because the other option looked actually worse, and so this is ‘safer’. [If you’re one of the people tied up on the trolley tracks, you want the person at the lever to switch it!]
AI can run on CPUs (with a certain inefficiency factor), so only burning all GPUs doesn’t seem like it would be sufficient. As for disruptive acts that are less deadly, it would be nice to have some examples but Eliezer says they’re too far out of the Overton Window to mention.
If what you’re saying about Eliezer’s claim is accurate, it does seem disingenuous to frame “The only worlds where humanity survives are ones where people like me do something extreme and unethical” as “I won’t do anything extreme and unethical [because humanity is doomed anyway]”. It makes Eliezer dangerous to be around if he’s mistaken, and if you’re significantly less pessimistic than he is (if you assign >10^-6 probability to humanity surviving), he’s mistaken in most of the worlds where humanity survives. Which are the worlds that matter the most.
And yeah, it’s nice that Eliezer claims that Eliezer can violate ethical injunctions because he’s smart enough, after repeatedly stating that people who violate ethical injunctions because they think they’re smart enough are almost always wrong. I don’t doubt he’ll pick the option that looks actually better to him. It’s just that he’s only human—he’s running on corrupted hardware like the rest of us.