That’s a choice, though. AGI could, for example, look like a powerful actor in its own right, with its own completely nonhuman drives and priorities, and a total disinterest in being directed in the sort of way you’d normally associate with a “resource”.
I agree with you! Intent alignment must be solved.
If by “intent alignment” you mean AGIs or ASIs taking orders from humans, and presumably specifically the humans who “own” them, or are in charge of the “powerful actors”, or form some human social elite, then it seems as though your concerns very much argue that that’s not the right kind of alignment to be going for.
The killer app for ASI is, and always has been, to have it take over the world and stop humans from screwing things up. That’s incompatible with keeping humans in charge, which is what I think you mean by “intent alignment”. But it’s not necessarily incompatible with behavior that’s good for humans. If you’re going to take on the (very possibly insoluble) problem of “aligning” AI with something, maybe you should choose “value alignment” or “friendliness” or whatever. Pick a goal where your success doesn’t directly cause obvious problems.
That’s a choice, though. AGI could, for example, look like a powerful actor in its own right, with its own completely nonhuman drives and priorities, and a total disinterest in being directed in the sort of way you’d normally associate with a “resource”.
My claim is that the incentives AGI creates are quite similar to the resource curse, not that it would literally behave like a resource. But:
If by “intent alignment” you mean AGIs or ASIs taking orders from humans, and presumably specifically the humans who “own” them, or are in charge of the “powerful actors”, or form some human social elite, then it seems as though your concerns very much argue that that’s not the right kind of alignment to be going for.
My default is that powerful actors will do their best to build systems that do what they ask them to do (ie they will not pursue aligning systems with human values).
The field points towards this: alignment efforts are primarily focused on controlling systems. I don’t think this is inherently a bad thing, but it results in the incentives I’m concerned about. I’ve not seen great work on defining human values, creating a value set a system could follow, and forcing them to follow it in a way that couldn’t be overridden by its creators. Anthropic’s Constitutional AI may be a counter-example.
The incentives point towards this as well. A system that is aligned to refuse efforts that could lead resource/power/capital concentration would be difficult to sell to corporations who are likely to pursue this.
These (here, here, and here) definitions are roughly what I am describing as intent alignment.
Pick a goal where your success doesn’t directly cause obvious problems
I agree but I’m afraid value alignment doesn’t meet this criterion. (I’m copy pasting my response on VA from elsewhere below).
I don’t think value alignment of a super-takeover AI would be a good idea, for the following reasons:
1) It seems irreversible. If we align with the wrong values, there seems little anyone can do about it after the fact.
2) The world is chaotic, and externalities are impossible to predict. Who would have guessed that the industrial revolution would lead to climate change? I think it’s very likely that an ASI will produce major, unforseeable externalities over time. If we have aligned it in an irreversible way, we can’t correct for externalities happening down the road. (Speed also makes it more likely that we can’t correct in time, so I think we should try to go slow).
3) There is no agreement on which values are ‘correct’. Personally, I’m a moral relativist, meaning I don’t believe in moral facts. Although perhaps niche among rationalists and EAs, I think a fair amount of humans shares my beliefs. In my opinion, a value-aligned AI would not make the world objectively better, but merely change it beyond recognition, regardless of the specific values implemented (although it would be important which values are implemented). It’s very uncertain whether such change would be considered as net positive by any surviving humans.
4) If one thinks that consciousness implies moral relevance, AIs will be conscious, creating more happy morally relevant beings is morally good (as MacAskill defends), and AIs are more efficient than humans and other animals, the consequence seems to be that we (and all other animals) will be replaced by AIs. I consider that an existentially bad outcome in itself, and value alignment could point straight at it.
I think at a minimum, any alignment plan would need to be reversible by humans, and to my understanding value alignment is not. I’m somewhat more hopeful about intent alignment and e.g. a UN commission providing the AI’s input.
The killer app for ASI is, and always has been, to have it take over the world and stop humans from screwing things up
I strongly disagree with this being a good outcome, I guess mostly because I would expect the majority of humans to not want this. If humans would actually elect an AI to be in charge, and they could be voted out as well, I could live with that. But a takeover by force from an AI is as bad for me as a takeover by force from a human, and much worse if it’s irreversible. If an AI is really such a good leader, let them show it by being elected (if humans decide that an AI should be allowed to run at all).
Well, yeah. But there are reasons why they could. Suppose you’re them...
Maybe you see a “FOOM” coming soon. You’re not God-King yet, so you can’t stop it. If you try to slow it down, others, unaligned with you, will just FOOM first. The present state of research gives you two choices for your FOOM: (a) try for friendly AI, or (b) get paperclipped. You assign very low utility to being paperclipped. So you go for friendly AI. Ceteris parabus, your having this choice becomes more likely if research in general is going toward friendliness and less likely if research in general is going toward intent alignment.
Maybe you’re afraid of what being God-King would turn you into, or you fear making some embarassingly stupid decision that switches you to the “paperclip” track, or you think having to be God-King would be a drag, or you’re morally opposed, or all of the above. Most people will go wrong eventually if given unlimited power, but that doesn’t mean they can’t stay non-wrong long enough to voluntarily give up that power for whatever reason. I personally would see myself on this track. Unfortunately I suspect that the barriers to being in charge of a “lab” select against it, though. And I think it’s also less likely if the prospective “God-King” is actually a group rather than an individual.
Maybe you’re forced, or not “in charge” any more, because there’s a torches-and-pitchforks-wielding mob or an enlightened democratic government or whatever. It could happen.
That’s a choice, though. AGI could, for example, look like a powerful actor in its own right, with its own completely nonhuman drives and priorities, and a total disinterest in being directed in the sort of way you’d normally associate with a “resource”.
If by “intent alignment” you mean AGIs or ASIs taking orders from humans, and presumably specifically the humans who “own” them, or are in charge of the “powerful actors”, or form some human social elite, then it seems as though your concerns very much argue that that’s not the right kind of alignment to be going for.
The killer app for ASI is, and always has been, to have it take over the world and stop humans from screwing things up. That’s incompatible with keeping humans in charge, which is what I think you mean by “intent alignment”. But it’s not necessarily incompatible with behavior that’s good for humans. If you’re going to take on the (very possibly insoluble) problem of “aligning” AI with something, maybe you should choose “value alignment” or “friendliness” or whatever. Pick a goal where your success doesn’t directly cause obvious problems.
My claim is that the incentives AGI creates are quite similar to the resource curse, not that it would literally behave like a resource. But:
My default is that powerful actors will do their best to build systems that do what they ask them to do (ie they will not pursue aligning systems with human values).
The field points towards this: alignment efforts are primarily focused on controlling systems. I don’t think this is inherently a bad thing, but it results in the incentives I’m concerned about. I’ve not seen great work on defining human values, creating a value set a system could follow, and forcing them to follow it in a way that couldn’t be overridden by its creators. Anthropic’s Constitutional AI may be a counter-example.
The incentives point towards this as well. A system that is aligned to refuse efforts that could lead resource/power/capital concentration would be difficult to sell to corporations who are likely to pursue this.
These (here, here, and here) definitions are roughly what I am describing as intent alignment.
I agree but I’m afraid value alignment doesn’t meet this criterion. (I’m copy pasting my response on VA from elsewhere below).
I don’t think value alignment of a super-takeover AI would be a good idea, for the following reasons:
1) It seems irreversible. If we align with the wrong values, there seems little anyone can do about it after the fact.
2) The world is chaotic, and externalities are impossible to predict. Who would have guessed that the industrial revolution would lead to climate change? I think it’s very likely that an ASI will produce major, unforseeable externalities over time. If we have aligned it in an irreversible way, we can’t correct for externalities happening down the road. (Speed also makes it more likely that we can’t correct in time, so I think we should try to go slow).
3) There is no agreement on which values are ‘correct’. Personally, I’m a moral relativist, meaning I don’t believe in moral facts. Although perhaps niche among rationalists and EAs, I think a fair amount of humans shares my beliefs. In my opinion, a value-aligned AI would not make the world objectively better, but merely change it beyond recognition, regardless of the specific values implemented (although it would be important which values are implemented). It’s very uncertain whether such change would be considered as net positive by any surviving humans.
4) If one thinks that consciousness implies moral relevance, AIs will be conscious, creating more happy morally relevant beings is morally good (as MacAskill defends), and AIs are more efficient than humans and other animals, the consequence seems to be that we (and all other animals) will be replaced by AIs. I consider that an existentially bad outcome in itself, and value alignment could point straight at it.
I think at a minimum, any alignment plan would need to be reversible by humans, and to my understanding value alignment is not. I’m somewhat more hopeful about intent alignment and e.g. a UN commission providing the AI’s input.
I strongly disagree with this being a good outcome, I guess mostly because I would expect the majority of humans to not want this. If humans would actually elect an AI to be in charge, and they could be voted out as well, I could live with that. But a takeover by force from an AI is as bad for me as a takeover by force from a human, and much worse if it’s irreversible. If an AI is really such a good leader, let them show it by being elected (if humans decide that an AI should be allowed to run at all).
But why would the people who are currently in charge of AI labs want to do that, when they could stay in charge and become god-kings instead?
Well, yeah. But there are reasons why they could. Suppose you’re them...
Maybe you see a “FOOM” coming soon. You’re not God-King yet, so you can’t stop it. If you try to slow it down, others, unaligned with you, will just FOOM first. The present state of research gives you two choices for your FOOM: (a) try for friendly AI, or (b) get paperclipped. You assign very low utility to being paperclipped. So you go for friendly AI. Ceteris parabus, your having this choice becomes more likely if research in general is going toward friendliness and less likely if research in general is going toward intent alignment.
Maybe you’re afraid of what being God-King would turn you into, or you fear making some embarassingly stupid decision that switches you to the “paperclip” track, or you think having to be God-King would be a drag, or you’re morally opposed, or all of the above. Most people will go wrong eventually if given unlimited power, but that doesn’t mean they can’t stay non-wrong long enough to voluntarily give up that power for whatever reason. I personally would see myself on this track. Unfortunately I suspect that the barriers to being in charge of a “lab” select against it, though. And I think it’s also less likely if the prospective “God-King” is actually a group rather than an individual.
Maybe you’re forced, or not “in charge” any more, because there’s a torches-and-pitchforks-wielding mob or an enlightened democratic government or whatever. It could happen.