There hasn’t been a large-scale nuclear war, should we be unafraid of proliferation? More specific to this argument, I don’t know any professional software developers who haven’t had the experience of being VERY surprised by a supposedly well-known algorithm.
In a perfect world, I think people would not need any concrete demonstration to be very concerned about AGI x-risk. Alan Turing and John von Neumann were very concerned about AGI x-risk, and they obviously didn’t need any concrete demonstration for that. And I think their reasons for concern were sound at the time, and remain sound today.
But many people today are skeptical that AGI poses any x-risk. (That’s unfortunate from my perspective, because I think they’re wrong.) The point of this post is to suggest that we AGI-concerned people might not be able to win over those skeptics via concrete demonstrations of AI doing scary (or scary-adjacent) things, either now or in the future—or at least, not all of the skeptics. It’s probably worth trying anyway—it might help for some of the skeptics. Regardless, understanding the exact failure modes is helpful.
I expect the second-order effects of allowing people to get political power by crisis-mongering about risk when there is no demonstration/empirical evidence to ruin the initially perfect world pretty immediately, even assuming that AI risk is high and real, because this would allow anyone to make claims about some arbitrary risk and get rewarded for it even if it isn’t true, and there’s no force that systematically favors true claims over false claims about risk in this incentive structure.
Indeed, I think it would be a worse world than now, since it supercharges already existing incentives to crisis monger for the news industry and political groups.
Also, while Alan Turing and John Von Neumann were great computer scientists, I don’t particularly have that much reason to elevate their AI risk opinions over anyone else on this topic, and their connection to AI is at best very indirect.
In a perfect world, everyone would be concerned about the risks for which there are good reasons to be concerned, and everyone would be unconcerned about the risks for which there are good reasons to be unconcerned, because everyone would be doing object-level checks of everyone else’s object-level claims and arguments, and coming to the correct conclusion about whether those claims and arguments are valid.
And those valid claims and arguments might involve demonstrations and empirical evidence, but also might be more indirect.
I do think Turing and von Neumann reached correct object-level conclusions via sound reasoning, but obviously I’m stating that belief without justifying it.
It’s true in a perfect world that everyone would be concerned about the risks for which there are good reasons to be concerned, and everyone would be unconcerned about the risks for which there are good reasons to be unconcerned, because everyone would be doing object-level checks of everyone else’s object-level claims and arguments, and coming to the correct conclusion about whether those claims and arguments are valid, so I shouldn’t have stated that the perfect world was ruined by that, but I consider this a fabricated option for reasons relating to how hard it is for average people to validate complex arguments, combined with the enormous economic benefits of specializing in a field, so I’m focused a lot more on what incentives does this give a real society, given our limitations.
I actually agree with this, and I agree with the claim that an existential risk can happen without leaving empirical evidence as a matter of sole possibility.
I have 2 things to say here:
I am more optimistic that we can get such empirical evidence for at least the most important parts of the AI risk case, like deceptive alignment, and here’s one reason as comment on offer:
2. From an expected value perspective, a problem can be both very important to work on and also have 0 tractability, and I think a lot of worlds where we get outright 0 evidence or close to 0 evidence on AI risk are also worlds where the problem is so intractable as to be effectively not solvable, so the expected value of solving the problem is also close to 0.
This also applies to the alien scenario: While from an epistemics perspective, it is worth it to consider the hypothesis that the aliens are unfriendly, from a decision/expected value perspective, almost all of the value is in the hypothesis that the aliens are friendly, since we cannot survive alien attacks except in very specific scenarios.
I am more optimistic that we can get such empirical evidence for at least the most important parts of the AI risk case, like deceptive alignment, and here’s one reason as comment on offer:
Can you elaborate on what you were pointing to in the linked example? The thread specifically I’ve seen a few people mention recently but I seem to be missing the conclusion they’re drawing from it.
It sounds as though you’re imagining that we can proliferate the one case in which we caught the AI into many cases which can be well understood as independent (rather than basically just being small variations).
and this comment, which talks about proliferating cases where 1 AI schemes into multiple instances to get more evidence:
crisis-mongering about risk when there is no demonstration/empirical evidence to ruin the initially perfect world pretty immediately
I think the key point of this post is precisely the question of “is there any such demonstration, short of the actual real very bad thing happening in a real setting that people who discount these as serious risks would accept as empirical evidence worth updating on?”
There hasn’t been a large-scale nuclear war, should we be unafraid of proliferation? More specific to this argument, I don’t know any professional software developers who haven’t had the experience of being VERY surprised by a supposedly well-known algorithm.
I’m not sure what argument you think I’m making.
In a perfect world, I think people would not need any concrete demonstration to be very concerned about AGI x-risk. Alan Turing and John von Neumann were very concerned about AGI x-risk, and they obviously didn’t need any concrete demonstration for that. And I think their reasons for concern were sound at the time, and remain sound today.
But many people today are skeptical that AGI poses any x-risk. (That’s unfortunate from my perspective, because I think they’re wrong.) The point of this post is to suggest that we AGI-concerned people might not be able to win over those skeptics via concrete demonstrations of AI doing scary (or scary-adjacent) things, either now or in the future—or at least, not all of the skeptics. It’s probably worth trying anyway—it might help for some of the skeptics. Regardless, understanding the exact failure modes is helpful.
I expect the second-order effects of allowing people to get political power by crisis-mongering about risk when there is no demonstration/empirical evidence to ruin the initially perfect world pretty immediately, even assuming that AI risk is high and real, because this would allow anyone to make claims about some arbitrary risk and get rewarded for it even if it isn’t true, and there’s no force that systematically favors true claims over false claims about risk in this incentive structure.
Indeed, I think it would be a worse world than now, since it supercharges already existing incentives to crisis monger for the news industry and political groups.
Also, while Alan Turing and John Von Neumann were great computer scientists, I don’t particularly have that much reason to elevate their AI risk opinions over anyone else on this topic, and their connection to AI is at best very indirect.
In a perfect world, everyone would be concerned about the risks for which there are good reasons to be concerned, and everyone would be unconcerned about the risks for which there are good reasons to be unconcerned, because everyone would be doing object-level checks of everyone else’s object-level claims and arguments, and coming to the correct conclusion about whether those claims and arguments are valid.
And those valid claims and arguments might involve demonstrations and empirical evidence, but also might be more indirect.
See also: It is conceivable for something to be an x-risk without there being any nice clean quantitative empirically-validated mathematical model proving that it is.
I do think Turing and von Neumann reached correct object-level conclusions via sound reasoning, but obviously I’m stating that belief without justifying it.
It’s true in a perfect world that everyone would be concerned about the risks for which there are good reasons to be concerned, and everyone would be unconcerned about the risks for which there are good reasons to be unconcerned, because everyone would be doing object-level checks of everyone else’s object-level claims and arguments, and coming to the correct conclusion about whether those claims and arguments are valid, so I shouldn’t have stated that the perfect world was ruined by that, but I consider this a fabricated option for reasons relating to how hard it is for average people to validate complex arguments, combined with the enormous economic benefits of specializing in a field, so I’m focused a lot more on what incentives does this give a real society, given our limitations.
To address this part:
I actually agree with this, and I agree with the claim that an existential risk can happen without leaving empirical evidence as a matter of sole possibility.
I have 2 things to say here:
I am more optimistic that we can get such empirical evidence for at least the most important parts of the AI risk case, like deceptive alignment, and here’s one reason as comment on offer:
https://www.lesswrong.com/posts/YTZAmJKydD5hdRSeG/?commentId=T57EvmkcDmksAc4P4
2. From an expected value perspective, a problem can be both very important to work on and also have 0 tractability, and I think a lot of worlds where we get outright 0 evidence or close to 0 evidence on AI risk are also worlds where the problem is so intractable as to be effectively not solvable, so the expected value of solving the problem is also close to 0.
This also applies to the alien scenario: While from an epistemics perspective, it is worth it to consider the hypothesis that the aliens are unfriendly, from a decision/expected value perspective, almost all of the value is in the hypothesis that the aliens are friendly, since we cannot survive alien attacks except in very specific scenarios.
Can you elaborate on what you were pointing to in the linked example? The thread specifically I’ve seen a few people mention recently but I seem to be missing the conclusion they’re drawing from it.
I was pointing to this quote:
and this comment, which talks about proliferating cases where 1 AI schemes into multiple instances to get more evidence:
https://www.lesswrong.com/posts/YTZAmJKydD5hdRSeG/#BkdBD5psSFyMaeesS
I think the key point of this post is precisely the question of “is there any such demonstration, short of the actual real very bad thing happening in a real setting that people who discount these as serious risks would accept as empirical evidence worth updating on?”