this is empirically false. genocide and slavery have been the norm across human history.
You need to not mix up conflicts between different human groups with the inability for humans to thrive. The fact that there has been a human history at all requires people to have the orientation to know what’s going on, the capacity to act on it, and the care to do so. Humanity hasn’t just given up or committed suicide, leaving just a nonhuman world.
Now it’s true that generally, there was a self-centered thriving that favored the well-being of oneself and one’s friends and family over others, and this would lead to various sorts of conflicts, often wrecking a lot of good people. We can only hope society becomes more discriminatory over time, to better nurture the goodness and only destroy the badness. But you can only say that genocide was bad because there was something that created good people who it was wrong to kill.
We are currently in the process of modifying our atmosphere in a way that is deadly to humans and almost did so recently in the past
But critically, various historical environmental problems had lead to the creation of environmentalist groups, which enabled society to notice these atmospheric problems. Contrast this with prior environmental changes that there was no protection against.
> AI is going to loosen up this default pull.
this assumes a specific model for AI: humans use the AI to do highly adversarial search and then blindly implement the results.
You are misunderstanding. By “loosen up this default pull”, I mean, let’s say you implement a bot to handle food production, from farm to table. Right now, food production needs to be human-legible because it involves a collaborative effort between lots of people. With the bot, even if it handles food production perfectly fine, you’ve now removed the force that generates human legibility for food production.
As you remove human involvement from more and more places, humans become able to do fewer and fewer things. Maybe humans can still thrive under such circumstances, but surely you can see that strong people have a by-default better chance than weak people do? Notably, this is a separate part of the argument from adversarial search, and it applies even if we limit ourselves to reflex-like methods. The point here is to highlight what currently allows humans to thrive, and how that gets weakened by AI.
Suppose instead humans only implement the results after verifying them, or require the AI to provide a mathematical proof that “this action won’t kill all humans”
none of these are unique to AGI. We have the same problem with nuclear weapons, biological weapons and any number of other technologies. AGI is uniquely friendly in the sense that at first it’s merely software: it has no impact on the real world unless we choose to let it
If you wait until humans have manually checked them all through, then you incentivize adversaries to develop military techniques that can destroy your country faster than you can wake up your interpretability researchers. (I expect this to be possible with only weak, reflex-based AI, like if you set up a whole bunch of automated bots to wreck havoc in various ways once triggered.)
How is this an argument for AGI risk?
It’s not, it’s registering my assumption in case you want to object to it. If you think nukes might be used in a more limited way, then maybe you also think adversarial searches might be used in a more limited way.
Something being unclear is not an argument for doom. At best it’s a restatement of my original weak argument: AGI will be powerful, therefore it might be bad
Registering something being unclear is helpful for where to take it. Like if we agreed on the overall picture, but you were more optimistic about the areas that were unclear, and I was more pessimistic about them, then I could continue the argument into those areas as well. Like I’m sort of trying to comprehensively enumerate all the relevant dynamics for how this is gonna develop, and explicitly mark off the places that are relevant to consider but which I haven’t properly addressed.
Right now, though, you seem to be assuming that humans by-default thrive, and only exogenous dangers like war or oppression can prevent this. Meanwhile, I’m more using a sort of “inertial” model, where certain neuroses can drive humans to spontaneously self-destruct, sometimes taking a lot of their neighbors with them. As such it seems less relevant to explore these subtrees until the issue of self-destructive neuroses are addressed.
even if this is a plausible model, it is by no means the only model or the default path.
Looks like the default path to me? Like AI companies are dumping lots of knowledge and skills into LLMs, for instance, and at my job we’ve started integrating them with our product. Are there any other relevant dynamics you are seeing?
it is equally plausible (in my opinion more so) that there is a limit to how far ahead intelligence can predict and science is fundamentally rate-limited by the speed of physical experimentation
You need physical experimentation to test how well your methods for unleashing energy/flow into a particular direction works, so building reflex-like/tool AIs is going to be fundamentally rate-limited by the speed of physical experimentation.
However, as you build up a library of tricks to interact with the world, you can use compute to search through ways to combine these tricks to make bigger things happen. This is generally bounded by whatever the biggest “energy source” you can tap into is, because it is really hard to bring multiple different “energy sources” together into some shared direction.
why are we assuming the adversaries will exploit your weakness? Why not assume we build corrigible AI that tries to help you instead.
We’ll build corrigible AI that tries to help us with ordinary stuff like transporting food from farms to homes.
However, the more low-impact it is, the more exploitable it is. If you want food from a self-driving truck, maybe you could just stand in front of it, and it will stop, and then some of your friends can break in to it and steal the food it is carrying.
To prevent this, we need to incapacitate criminals. But criminals don’t want to be incapacitated, so they will exploit whatever weaknesses the system for incapacitating them has. As part of this, the more advanced criminals will presumably build AIs to try to seek out weaknesses in the system. That’s what I’m referring to with adversaries exploiting your weakness.
A utility-maximizer is a specific design of AGI, and moreover totally different from the next-token-prediction AIs that currently exists. Why should I assume that this particular design will suddenly become popular (despite the clear disadvantages that you have already stated)?
Being robust to exploitation from adversaries massively restricts your options. Whether the exact implementation includes an explicit utility function or not is less relevant than the fact that as it spontaneously adapts to undermine its adversaries, it needs to do so in a way that doesn’t undermine humanity in general. I.e. you need to build some system that can unleash massive destruction towards sufficiently unruly enemies, without unleashing massive destruction towards friends. I think the classic utility maximizer instrumental convergence risk gives a pretty accurate picture for how that will look / how that gives you dangers, but if you think next-token-predictors can unleash destruction in a more controlled way, I’m all ears.
I realize I should probably add a 3rd category of argument: arguments which assume a specific (unlikely) path for AGI development and then argue this particular path is bad.
Any path for history needs to account for security and resource flow/allocation. These are the most important part of everything. My position doesn’t really assume that much beyond this.
The fact is that there are certain robust resources (like sunlight etc.) which exert constant pressure on the world, and which everything is dependent on. Whatever happens, these resources must go somewhere, so any forecast for the future that’s worth its salt must ultimately make predictions about those.
Each part of my argument addresses a different factor involved in these resource flows. Often you can just inspect the world and see that clearly that’s how the resources are flowing. Other times, my argument is disjunctive. Yet other times, sure maybe I’m wrong, but the way I might be wrong would imply the possibility of a lot of resources rushing out into some other channel, which again is worth exploring.
Plus, let’s remember, Strong Evidence Is Common. If there’s some particular parts of the argument where you don’t know how to inspect the world to get plenty of evidence, then I can try to guide you. But blinding yourself because of “muh evidence” is just makes your opinion worthless.
You need to not mix up conflicts between different human groups with the inability for humans to thrive. The fact that there has been a human history at all requires people to have the orientation to know what’s going on, the capacity to act on it, and the care to do so. Humanity hasn’t just given up or committed suicide, leaving just a nonhuman world.
Now it’s true that generally, there was a self-centered thriving that favored the well-being of oneself and one’s friends and family over others, and this would lead to various sorts of conflicts, often wrecking a lot of good people. We can only hope society becomes more discriminatory over time, to better nurture the goodness and only destroy the badness. But you can only say that genocide was bad because there was something that created good people who it was wrong to kill.
But critically, various historical environmental problems had lead to the creation of environmentalist groups, which enabled society to notice these atmospheric problems. Contrast this with prior environmental changes that there was no protection against.
You are misunderstanding. By “loosen up this default pull”, I mean, let’s say you implement a bot to handle food production, from farm to table. Right now, food production needs to be human-legible because it involves a collaborative effort between lots of people. With the bot, even if it handles food production perfectly fine, you’ve now removed the force that generates human legibility for food production.
As you remove human involvement from more and more places, humans become able to do fewer and fewer things. Maybe humans can still thrive under such circumstances, but surely you can see that strong people have a by-default better chance than weak people do? Notably, this is a separate part of the argument from adversarial search, and it applies even if we limit ourselves to reflex-like methods. The point here is to highlight what currently allows humans to thrive, and how that gets weakened by AI.
If you wait until humans have manually checked them all through, then you incentivize adversaries to develop military techniques that can destroy your country faster than you can wake up your interpretability researchers. (I expect this to be possible with only weak, reflex-based AI, like if you set up a whole bunch of automated bots to wreck havoc in various ways once triggered.)
It’s not, it’s registering my assumption in case you want to object to it. If you think nukes might be used in a more limited way, then maybe you also think adversarial searches might be used in a more limited way.
Registering something being unclear is helpful for where to take it. Like if we agreed on the overall picture, but you were more optimistic about the areas that were unclear, and I was more pessimistic about them, then I could continue the argument into those areas as well. Like I’m sort of trying to comprehensively enumerate all the relevant dynamics for how this is gonna develop, and explicitly mark off the places that are relevant to consider but which I haven’t properly addressed.
Right now, though, you seem to be assuming that humans by-default thrive, and only exogenous dangers like war or oppression can prevent this. Meanwhile, I’m more using a sort of “inertial” model, where certain neuroses can drive humans to spontaneously self-destruct, sometimes taking a lot of their neighbors with them. As such it seems less relevant to explore these subtrees until the issue of self-destructive neuroses are addressed.
Looks like the default path to me? Like AI companies are dumping lots of knowledge and skills into LLMs, for instance, and at my job we’ve started integrating them with our product. Are there any other relevant dynamics you are seeing?
You need physical experimentation to test how well your methods for unleashing energy/flow into a particular direction works, so building reflex-like/tool AIs is going to be fundamentally rate-limited by the speed of physical experimentation.
However, as you build up a library of tricks to interact with the world, you can use compute to search through ways to combine these tricks to make bigger things happen. This is generally bounded by whatever the biggest “energy source” you can tap into is, because it is really hard to bring multiple different “energy sources” together into some shared direction.
We’ll build corrigible AI that tries to help us with ordinary stuff like transporting food from farms to homes.
However, the more low-impact it is, the more exploitable it is. If you want food from a self-driving truck, maybe you could just stand in front of it, and it will stop, and then some of your friends can break in to it and steal the food it is carrying.
To prevent this, we need to incapacitate criminals. But criminals don’t want to be incapacitated, so they will exploit whatever weaknesses the system for incapacitating them has. As part of this, the more advanced criminals will presumably build AIs to try to seek out weaknesses in the system. That’s what I’m referring to with adversaries exploiting your weakness.
Being robust to exploitation from adversaries massively restricts your options. Whether the exact implementation includes an explicit utility function or not is less relevant than the fact that as it spontaneously adapts to undermine its adversaries, it needs to do so in a way that doesn’t undermine humanity in general. I.e. you need to build some system that can unleash massive destruction towards sufficiently unruly enemies, without unleashing massive destruction towards friends. I think the classic utility maximizer instrumental convergence risk gives a pretty accurate picture for how that will look / how that gives you dangers, but if you think next-token-predictors can unleash destruction in a more controlled way, I’m all ears.
Any path for history needs to account for security and resource flow/allocation. These are the most important part of everything. My position doesn’t really assume that much beyond this.
Making a point-by-point refutation misses the broader fact that any long sequence of argument like this adds up to very little evidence.
Even if you somehow convince me that each of your (10) arguments was like 75% true, they’re still going to add up to nothing because 0.7510=0.05
Unless you can summarize you argument in at most 2 sentences (with evidence), it’s completely ignoreable.
This is not how learning any (even slightly complex) topic works.
Yudkowsky 2017, AronT 2023 and Gwern 2019, if you’re curious why you’re getting downvoted.
(I tried to figure out whether this method of estimation works, and it seemed more accurate than I thought, but then I got distracted).
Cope. Here you’re taking a probabilistic perspective, but that perspective sucks.
The fact is that there are certain robust resources (like sunlight etc.) which exert constant pressure on the world, and which everything is dependent on. Whatever happens, these resources must go somewhere, so any forecast for the future that’s worth its salt must ultimately make predictions about those.
Each part of my argument addresses a different factor involved in these resource flows. Often you can just inspect the world and see that clearly that’s how the resources are flowing. Other times, my argument is disjunctive. Yet other times, sure maybe I’m wrong, but the way I might be wrong would imply the possibility of a lot of resources rushing out into some other channel, which again is worth exploring.
Plus, let’s remember, Strong Evidence Is Common. If there’s some particular parts of the argument where you don’t know how to inspect the world to get plenty of evidence, then I can try to guide you. But blinding yourself because of “muh evidence” is just makes your opinion worthless.