And we don’t just want to avoid extinction. We want to thrive. We want to expand our civilization and build a better world for our descendants.
And for ourselves. If AGI doesn’t take away civilization’s future, why take away future of individual people? The technical problem should be relatively trivial, given a few years to get started.
And it can’t just kind of like the idea of human flourishing. Our well-being needs to be the primary thing it cares about.
If it’s not the primary thing it cares about, we lose the cosmic endowment. But we might keep our lives and civilization.
It does need to care at all for this to happen, and a paperclip maximizer won’t. But something trained on human culture might retain at least a tiny bit of compassion on reflection, which is enough to give back a tiny bit of the cosmic wealth it just took.
Humans are also potentially a threat.
Not if you use a superintelligently designed sandbox. It’s a question of spending literally no resources compared to spending at least a tiny little fraction of future resources.
The Uakari Monkey isn’t going extinct because humans are trying to kill it but because wood is useful and their habitat happens to be made of trees.
Saving monkeys[1] is also somewhat expensive, which becomes a lesser concern with more wealth, and a trivial concern with cosmic wealth. I think it’s an actual crux for this example. With enough wealth to trivially save all monkeys, no monkeys would go extinct, provided we care even a little bit more than precisely not at all.
With enough wealth to trivially save all monkeys, no monkeys would go extinct, provided we care even a little bit more than precisely not at all.
I’m confused about this point. Perhaps you mean wealth in a broad sense that includes “we don’t need to worry about getting more wood.” But as long as wood is a useful resource that humans could use more of to acquire more wealth and do other things that we value more than saving monkeys, then we will continue to take wood from the monkeys. Likewise, even if an AGI values human welfare somewhat, it will still take our resources as long as it values other things morethan human welfare.
I found the monkey example much more compelling than “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.” Taking resources from humans seems more likely than using humans as resources.
For the monkeys example, I mean that I expect that in practice there will be activists that will actually save the monkeys if they are wealthy enough to succeed in doing so on a whim. There are already expensive rainforest conservation efforts costing hundreds of milions of dollars. Imagine that they instead cost $10 and anyone could pay that cost without needing to coordinate with others. Then, I claim, someone would.
By analogy, the same should happen with humanity instead of monkeys, if AGIs reason in a sufficiently human-like way. I don’t currently find it likely that most AGIs would normatively accept some decision theory that rules it out. It’s obiously possible in principle to construct AGIs that follow some decision theory (or value paperclips), but that’s not the same thing as such properties of AGI behavior being convergent and likely.
I think a default shape of a misaligned AGIs is a sufficiently capable simulacrum, a human-like alien thing that faces the same value extrapolation issues as humanity, in a closely analogous way. (That is, if an AGI alignment project doesn’t make something clever instead that becomes much more alien and dangerous as a result.) And a default aligned AGI is the same, but not that alien, more of a generalized human.
something trained on human culture might retain at least a tiny bit of compassion on reflection
This depends on where “compassion” comes from. It’s not clear that training on data from human culture gets you much in the way of human-like internals. (Compare: contemporary language models know how to say a lot about “happiness”, but it seems very dubious that they feel happiness themselves.)
I think a lot of people would take issue with your basic premise of an AGI “wanting” something:
It seems likely that humanity will build superintelligence – i.e. a system that can be described as ‘wanting’ things to happen and is very good at actually making them happen.
The argument against is that you are anthropomorphizing whatever the “AGI” thing turns out to be. Sure, if you do an atom-by-atom simulation of a living human brain, that simulation will most likely have human “wants”, But that seems radically different from whatever the current direction of AI development is. That the two will converge is a bit of a leap, and there is a definite disagreement between ML experts on that point.
I agree that I did not justify this claim and it is controversial in the ML community. I’ll try to explain why I think this.
First of all, when I say an AI ‘wants things’ or is ‘an agent’ I just mean it robustly and autonomously brings about specific outcomes. I think all of the arguments made above apply with this definition and don’t rely on anthropomorphism.
Why are we likely to build superintelligent agents? 1. It will be possible to build superintelligent agents I’m just going to take it as a given here for the sake of time. Note: I’m not claiming it will be easy, but I don’t think anyone can really say with confidence that it will be impossible to build such agents within the century. 2. There will be strong incentives to build superintelligent agents I’m generally pretty skeptical of claims that if you just train an AI system for long enough it becomes a scary consequentialist. Instead, I think it is likely someone will build a system like this because humans want things and building a superintelligent agent that wants to do what you want pretty much solves all your problems.For example, they might want to remain technologically relevant/be able to defend themselves from other countries/extraterrestrial civilizations, expand human civilization, etc. Building a thing that ‘robustly and autonomously brings about specific outcomes’ would be helpful for all this stuff.
In order to use current systems (‘tool AIs’ as some people call them) to actually get things done in the world, a human needs to be in the loop, which is pretty uncompetitive with fully autonomous agents.
I wouldn’t be super surprised if humans didn’t build superintelligent agents for a while—even if they could. Like I mentioned in the post, most people prob want to stay in control. But I’d put >50% credence on it happening pretty soon after it is possible because of coordination difficulty and the unilateralist curse.
No disagreement on point 1 from me, and I think that part is less controversial. Point 2 is closer to the crux:
building a superintelligent agent that wants to do what you want pretty much solves all your problems
I think what humans really want is not an AI who “wants what you want” but “does what you want”, without anything like a want of its own. That is, if what you want changes, the AI will “happily” do it without resisting, once it understands what it is you want, anyway. Whether it is possible without it “wanting” something, I have no idea, and I doubt this question has a clear answer at present.
An example I had in my head was something like “Human wants food, I’ll make a bowl of pasta” vs “I want human to survive and will feed them, whether they want to eat or not because they want to survive, too”. I am not sure why the latter is needed if that is what you are saying.
(1) The AI will value or want the resources used by humans. Perhaps. Or, perhaps the AI will conclude that being on a relatively hot planet in a high-oxygen atmosphere with lots of water isn’t optimal and leave the planet entirely.
(2) The AI will view humans as a threat. The superhuman AI that those on Less Wrong usually posit, one so powerful that it can cause human extinction with ease, can’t be turned off or reprogrammed and can manipulate humans as easily as I can type can’t effectively be threatened by human beings.
(3) An AI which just somewhat cares about humans is insufficient for human survival. Why? Marginal utility is a thing.
This alone isn’t enough, and in the past I didn’t believe the conclusion. The additional argument that leads to the conclusion is path-dependence of preferred outcomes. The fact that human civilization currently already exists is a strong argument for it being valuable to let it continue existing in some form, well above the motivation to bring it into existence if it didn’t already exist. Bringing it into existence might fail to make the cut, as there are many other things that a strongly optimized outcome could contain, if its choice wasn’t influenced by the past.
relatively hot planet in a high-oxygen atmosphere with lots of water
But atoms? More seriously, the greatest cost is probably starting expansion a tiny bit later, not making the most effective use of what’s immediately at hand.
And for ourselves. If AGI doesn’t take away civilization’s future, why take away future of individual people? The technical problem should be relatively trivial, given a few years to get started.
If it’s not the primary thing it cares about, we lose the cosmic endowment. But we might keep our lives and civilization.
It does need to care at all for this to happen, and a paperclip maximizer won’t. But something trained on human culture might retain at least a tiny bit of compassion on reflection, which is enough to give back a tiny bit of the cosmic wealth it just took.
Not if you use a superintelligently designed sandbox. It’s a question of spending literally no resources compared to spending at least a tiny little fraction of future resources.
Saving monkeys[1] is also somewhat expensive, which becomes a lesser concern with more wealth, and a trivial concern with cosmic wealth. I think it’s an actual crux for this example. With enough wealth to trivially save all monkeys, no monkeys would go extinct, provided we care even a little bit more than precisely not at all.
Or, as the case may be, their habitats. Bald uakaris seem only tangentially threatened right now.
I’m confused about this point. Perhaps you mean wealth in a broad sense that includes “we don’t need to worry about getting more wood.” But as long as wood is a useful resource that humans could use more of to acquire more wealth and do other things that we value more than saving monkeys, then we will continue to take wood from the monkeys. Likewise, even if an AGI values human welfare somewhat, it will still take our resources as long as it values other things more than human welfare.
I found the monkey example much more compelling than “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.” Taking resources from humans seems more likely than using humans as resources.
For the monkeys example, I mean that I expect that in practice there will be activists that will actually save the monkeys if they are wealthy enough to succeed in doing so on a whim. There are already expensive rainforest conservation efforts costing hundreds of milions of dollars. Imagine that they instead cost $10 and anyone could pay that cost without needing to coordinate with others. Then, I claim, someone would.
By analogy, the same should happen with humanity instead of monkeys, if AGIs reason in a sufficiently human-like way. I don’t currently find it likely that most AGIs would normatively accept some decision theory that rules it out. It’s obiously possible in principle to construct AGIs that follow some decision theory (or value paperclips), but that’s not the same thing as such properties of AGI behavior being convergent and likely.
I think a default shape of a misaligned AGIs is a sufficiently capable simulacrum, a human-like alien thing that faces the same value extrapolation issues as humanity, in a closely analogous way. (That is, if an AGI alignment project doesn’t make something clever instead that becomes much more alien and dangerous as a result.) And a default aligned AGI is the same, but not that alien, more of a generalized human.
This depends on where “compassion” comes from. It’s not clear that training on data from human culture gets you much in the way of human-like internals. (Compare: contemporary language models know how to say a lot about “happiness”, but it seems very dubious that they feel happiness themselves.)
These are good points. Maybe we’ll align these things enough to where they’ll give us a little hamster tank to run around in.
I think a lot of people would take issue with your basic premise of an AGI “wanting” something:
The argument against is that you are anthropomorphizing whatever the “AGI” thing turns out to be. Sure, if you do an atom-by-atom simulation of a living human brain, that simulation will most likely have human “wants”, But that seems radically different from whatever the current direction of AI development is. That the two will converge is a bit of a leap, and there is a definite disagreement between ML experts on that point.
I agree that I did not justify this claim and it is controversial in the ML community. I’ll try to explain why I think this.
First of all, when I say an AI ‘wants things’ or is ‘an agent’ I just mean it robustly and autonomously brings about specific outcomes. I think all of the arguments made above apply with this definition and don’t rely on anthropomorphism.
Why are we likely to build superintelligent agents?
1. It will be possible to build superintelligent agents
I’m just going to take it as a given here for the sake of time. Note: I’m not claiming it will be easy, but I don’t think anyone can really say with confidence that it will be impossible to build such agents within the century.
2. There will be strong incentives to build superintelligent agents
I’m generally pretty skeptical of claims that if you just train an AI system for long enough it becomes a scary consequentialist. Instead, I think it is likely someone will build a system like this because humans want things and building a superintelligent agent that wants to do what you want pretty much solves all your problems. For example, they might want to remain technologically relevant/be able to defend themselves from other countries/extraterrestrial civilizations, expand human civilization, etc. Building a thing that ‘robustly and autonomously brings about specific outcomes’ would be helpful for all this stuff.
In order to use current systems (‘tool AIs’ as some people call them) to actually get things done in the world, a human needs to be in the loop, which is pretty uncompetitive with fully autonomous agents.
I wouldn’t be super surprised if humans didn’t build superintelligent agents for a while—even if they could. Like I mentioned in the post, most people prob want to stay in control. But I’d put >50% credence on it happening pretty soon after it is possible because of coordination difficulty and the unilateralist curse.
No disagreement on point 1 from me, and I think that part is less controversial. Point 2 is closer to the crux:
I think what humans really want is not an AI who “wants what you want” but “does what you want”, without anything like a want of its own. That is, if what you want changes, the AI will “happily” do it without resisting, once it understands what it is you want, anyway. Whether it is possible without it “wanting” something, I have no idea, and I doubt this question has a clear answer at present.
If you have a complex goal and don’t know the steps that would be required to solve the goal “does what you want” is not enough.
If you however have “wants what you want” the AGI can figure out the necessary steps.
An example I had in my head was something like “Human wants food, I’ll make a bowl of pasta” vs “I want human to survive and will feed them, whether they want to eat or not because they want to survive, too”. I am not sure why the latter is needed if that is what you are saying.
You are making a number of assumptions here.
(1) The AI will value or want the resources used by humans. Perhaps. Or, perhaps the AI will conclude that being on a relatively hot planet in a high-oxygen atmosphere with lots of water isn’t optimal and leave the planet entirely.
(2) The AI will view humans as a threat. The superhuman AI that those on Less Wrong usually posit, one so powerful that it can cause human extinction with ease, can’t be turned off or reprogrammed and can manipulate humans as easily as I can type can’t effectively be threatened by human beings.
(3) An AI which just somewhat cares about humans is insufficient for human survival. Why? Marginal utility is a thing.
This alone isn’t enough, and in the past I didn’t believe the conclusion. The additional argument that leads to the conclusion is path-dependence of preferred outcomes. The fact that human civilization currently already exists is a strong argument for it being valuable to let it continue existing in some form, well above the motivation to bring it into existence if it didn’t already exist. Bringing it into existence might fail to make the cut, as there are many other things that a strongly optimized outcome could contain, if its choice wasn’t influenced by the past.
But atoms? More seriously, the greatest cost is probably starting expansion a tiny bit later, not making the most effective use of what’s immediately at hand.
“The greatest cost is probably starting expansion a tiny bit later, not making the most effective use of what’s immediately at hand.”
Possible, but not definitely so. We don’t really know all the relevant variables.