That’s a choice, though. AGI could, for example, look like a powerful actor in its own right, with its own completely nonhuman drives and priorities, and a total disinterest in being directed in the sort of way you’d normally associate with a “resource”.
I agree with you! Intent alignment must be solved.
If by “intent alignment” you mean AGIs or ASIs taking orders from humans, and presumably specifically the humans who “own” them, or are in charge of the “powerful actors”, or form some human social elite, then it seems as though your concerns very much argue that that’s not the right kind of alignment to be going for.
The killer app for ASI is, and always has been, to have it take over the world and stop humans from screwing things up. That’s incompatible with keeping humans in charge, which is what I think you mean by “intent alignment”. But it’s not necessarily incompatible with behavior that’s good for humans. If you’re going to take on the (very possibly insoluble) problem of “aligning” AI with something, maybe you should choose “value alignment” or “friendliness” or whatever. Pick a goal where your success doesn’t directly cause obvious problems.
That’s a choice, though. AGI could, for example, look like a powerful actor in its own right, with its own completely nonhuman drives and priorities, and a total disinterest in being directed in the sort of way you’d normally associate with a “resource”.
My claim is that the incentives AGI creates are quite similar to the resource curse, not that it would literally behave like a resource. But:
If by “intent alignment” you mean AGIs or ASIs taking orders from humans, and presumably specifically the humans who “own” them, or are in charge of the “powerful actors”, or form some human social elite, then it seems as though your concerns very much argue that that’s not the right kind of alignment to be going for.
My default is that powerful actors will do their best to build systems that do what they ask them to do (ie they will not pursue aligning systems with human values).
The field points towards this: alignment efforts are primarily focused on controlling systems. I don’t think this is inherently a bad thing, but it results in the incentives I’m concerned about. I’ve not seen great work on defining human values, creating a value set a system could follow, and forcing them to follow it in a way that couldn’t be overridden by its creators. Anthropic’s Constitutional AI may be a counter-example.
The incentives point towards this as well. A system that is aligned to refuse efforts that could lead resource/power/capital concentration would be difficult to sell to corporations who are likely to pursue this.
These (here, here, and here) definitions are roughly what I am describing as intent alignment.
Well, yeah. But there are reasons why they could. Suppose you’re them...
Maybe you see a “FOOM” coming soon. You’re not God-King yet, so you can’t stop it. If you try to slow it down, others, unaligned with you, will just FOOM first. The present state of research gives you two choices for your FOOM: (a) try for friendly AI, or (b) get paperclipped. You assign very low utility to being paperclipped. So you go for friendly AI. Ceteris parabus, your having this choice becomes more likely if research in general is going toward friendliness and less likely if research in general is going toward intent alignment.
Maybe you’re afraid of what being God-King would turn you into, or you fear making some embarassingly stupid decision that switches you to the “paperclip” track, or you think having to be God-King would be a drag, or you’re morally opposed, or all of the above. Most people will go wrong eventually if given unlimited power, but that doesn’t mean they can’t stay non-wrong long enough to voluntarily give up that power for whatever reason. I personally would see myself on this track. Unfortunately I suspect that the barriers to being in charge of a “lab” select against it, though. And I think it’s also less likely if the prospective “God-King” is actually a group rather than an individual.
Maybe you’re forced, or not “in charge” any more, because there’s a torches-and-pitchforks-wielding mob or an enlightened democratic government or whatever. It could happen.
That’s a choice, though. AGI could, for example, look like a powerful actor in its own right, with its own completely nonhuman drives and priorities, and a total disinterest in being directed in the sort of way you’d normally associate with a “resource”.
If by “intent alignment” you mean AGIs or ASIs taking orders from humans, and presumably specifically the humans who “own” them, or are in charge of the “powerful actors”, or form some human social elite, then it seems as though your concerns very much argue that that’s not the right kind of alignment to be going for.
The killer app for ASI is, and always has been, to have it take over the world and stop humans from screwing things up. That’s incompatible with keeping humans in charge, which is what I think you mean by “intent alignment”. But it’s not necessarily incompatible with behavior that’s good for humans. If you’re going to take on the (very possibly insoluble) problem of “aligning” AI with something, maybe you should choose “value alignment” or “friendliness” or whatever. Pick a goal where your success doesn’t directly cause obvious problems.
My claim is that the incentives AGI creates are quite similar to the resource curse, not that it would literally behave like a resource. But:
My default is that powerful actors will do their best to build systems that do what they ask them to do (ie they will not pursue aligning systems with human values).
The field points towards this: alignment efforts are primarily focused on controlling systems. I don’t think this is inherently a bad thing, but it results in the incentives I’m concerned about. I’ve not seen great work on defining human values, creating a value set a system could follow, and forcing them to follow it in a way that couldn’t be overridden by its creators. Anthropic’s Constitutional AI may be a counter-example.
The incentives point towards this as well. A system that is aligned to refuse efforts that could lead resource/power/capital concentration would be difficult to sell to corporations who are likely to pursue this.
These (here, here, and here) definitions are roughly what I am describing as intent alignment.
But why would the people who are currently in charge of AI labs want to do that, when they could stay in charge and become god-kings instead?
Well, yeah. But there are reasons why they could. Suppose you’re them...
Maybe you see a “FOOM” coming soon. You’re not God-King yet, so you can’t stop it. If you try to slow it down, others, unaligned with you, will just FOOM first. The present state of research gives you two choices for your FOOM: (a) try for friendly AI, or (b) get paperclipped. You assign very low utility to being paperclipped. So you go for friendly AI. Ceteris parabus, your having this choice becomes more likely if research in general is going toward friendliness and less likely if research in general is going toward intent alignment.
Maybe you’re afraid of what being God-King would turn you into, or you fear making some embarassingly stupid decision that switches you to the “paperclip” track, or you think having to be God-King would be a drag, or you’re morally opposed, or all of the above. Most people will go wrong eventually if given unlimited power, but that doesn’t mean they can’t stay non-wrong long enough to voluntarily give up that power for whatever reason. I personally would see myself on this track. Unfortunately I suspect that the barriers to being in charge of a “lab” select against it, though. And I think it’s also less likely if the prospective “God-King” is actually a group rather than an individual.
Maybe you’re forced, or not “in charge” any more, because there’s a torches-and-pitchforks-wielding mob or an enlightened democratic government or whatever. It could happen.