I think a lot of people would take issue with your basic premise of an AGI “wanting” something:
It seems likely that humanity will build superintelligence – i.e. a system that can be described as ‘wanting’ things to happen and is very good at actually making them happen.
The argument against is that you are anthropomorphizing whatever the “AGI” thing turns out to be. Sure, if you do an atom-by-atom simulation of a living human brain, that simulation will most likely have human “wants”, But that seems radically different from whatever the current direction of AI development is. That the two will converge is a bit of a leap, and there is a definite disagreement between ML experts on that point.
I agree that I did not justify this claim and it is controversial in the ML community. I’ll try to explain why I think this.
First of all, when I say an AI ‘wants things’ or is ‘an agent’ I just mean it robustly and autonomously brings about specific outcomes. I think all of the arguments made above apply with this definition and don’t rely on anthropomorphism.
Why are we likely to build superintelligent agents? 1. It will be possible to build superintelligent agents I’m just going to take it as a given here for the sake of time. Note: I’m not claiming it will be easy, but I don’t think anyone can really say with confidence that it will be impossible to build such agents within the century. 2. There will be strong incentives to build superintelligent agents I’m generally pretty skeptical of claims that if you just train an AI system for long enough it becomes a scary consequentialist. Instead, I think it is likely someone will build a system like this because humans want things and building a superintelligent agent that wants to do what you want pretty much solves all your problems.For example, they might want to remain technologically relevant/be able to defend themselves from other countries/extraterrestrial civilizations, expand human civilization, etc. Building a thing that ‘robustly and autonomously brings about specific outcomes’ would be helpful for all this stuff.
In order to use current systems (‘tool AIs’ as some people call them) to actually get things done in the world, a human needs to be in the loop, which is pretty uncompetitive with fully autonomous agents.
I wouldn’t be super surprised if humans didn’t build superintelligent agents for a while—even if they could. Like I mentioned in the post, most people prob want to stay in control. But I’d put >50% credence on it happening pretty soon after it is possible because of coordination difficulty and the unilateralist curse.
No disagreement on point 1 from me, and I think that part is less controversial. Point 2 is closer to the crux:
building a superintelligent agent that wants to do what you want pretty much solves all your problems
I think what humans really want is not an AI who “wants what you want” but “does what you want”, without anything like a want of its own. That is, if what you want changes, the AI will “happily” do it without resisting, once it understands what it is you want, anyway. Whether it is possible without it “wanting” something, I have no idea, and I doubt this question has a clear answer at present.
An example I had in my head was something like “Human wants food, I’ll make a bowl of pasta” vs “I want human to survive and will feed them, whether they want to eat or not because they want to survive, too”. I am not sure why the latter is needed if that is what you are saying.
I think a lot of people would take issue with your basic premise of an AGI “wanting” something:
The argument against is that you are anthropomorphizing whatever the “AGI” thing turns out to be. Sure, if you do an atom-by-atom simulation of a living human brain, that simulation will most likely have human “wants”, But that seems radically different from whatever the current direction of AI development is. That the two will converge is a bit of a leap, and there is a definite disagreement between ML experts on that point.
I agree that I did not justify this claim and it is controversial in the ML community. I’ll try to explain why I think this.
First of all, when I say an AI ‘wants things’ or is ‘an agent’ I just mean it robustly and autonomously brings about specific outcomes. I think all of the arguments made above apply with this definition and don’t rely on anthropomorphism.
Why are we likely to build superintelligent agents?
1. It will be possible to build superintelligent agents
I’m just going to take it as a given here for the sake of time. Note: I’m not claiming it will be easy, but I don’t think anyone can really say with confidence that it will be impossible to build such agents within the century.
2. There will be strong incentives to build superintelligent agents
I’m generally pretty skeptical of claims that if you just train an AI system for long enough it becomes a scary consequentialist. Instead, I think it is likely someone will build a system like this because humans want things and building a superintelligent agent that wants to do what you want pretty much solves all your problems. For example, they might want to remain technologically relevant/be able to defend themselves from other countries/extraterrestrial civilizations, expand human civilization, etc. Building a thing that ‘robustly and autonomously brings about specific outcomes’ would be helpful for all this stuff.
In order to use current systems (‘tool AIs’ as some people call them) to actually get things done in the world, a human needs to be in the loop, which is pretty uncompetitive with fully autonomous agents.
I wouldn’t be super surprised if humans didn’t build superintelligent agents for a while—even if they could. Like I mentioned in the post, most people prob want to stay in control. But I’d put >50% credence on it happening pretty soon after it is possible because of coordination difficulty and the unilateralist curse.
No disagreement on point 1 from me, and I think that part is less controversial. Point 2 is closer to the crux:
I think what humans really want is not an AI who “wants what you want” but “does what you want”, without anything like a want of its own. That is, if what you want changes, the AI will “happily” do it without resisting, once it understands what it is you want, anyway. Whether it is possible without it “wanting” something, I have no idea, and I doubt this question has a clear answer at present.
If you have a complex goal and don’t know the steps that would be required to solve the goal “does what you want” is not enough.
If you however have “wants what you want” the AGI can figure out the necessary steps.
An example I had in my head was something like “Human wants food, I’ll make a bowl of pasta” vs “I want human to survive and will feed them, whether they want to eat or not because they want to survive, too”. I am not sure why the latter is needed if that is what you are saying.