Could you please clarify in which sense you use the word “agency”?
One sense is the technicality of setting oneself problems and choosing things to pursue instead of sitting IDLE waiting for commands. For example, the difference between AutoGPT and ChatGPT. (Except, AutoGPT doesn’t choose for oneself the highest-level problems, but it could be engineered to do so trivially as well.)
I think that the presence or absence of this kind of agency is not relevant to technical alignment:
First, we also want to protect from AI misuse (by humans) and having “non-agentic” but extremely smart AI doesn’t solve the problem. If you try to extrapolate GPT capabilities, it’s obvious that superintelligent GPT-n (or other systems surrounding it, e.g., content filters, but without loss of generality we can consider it a single AI system) should be extremely well aligned with humanity so that it itself chooses to solve certain problems and refuse to solve others, and also choosing the specific way of solving this or that problem. Even though these problems were given to it by the users.
The second meaning of “agency” is synonymous with “resourcefulness”, “intelligence” (in some sense), the ability to overcome obstacles and not back down in the face of challenges. I don’t see how this meaning of “agency” is directly relevant to the alignment question.
The third possible meaning of “agency” is having some intrinsic opinion, volition, tendencies, values, or emotions. The opposite is being completely “neutral”, in some sense. I think complete neutrality just doesn’t physically exist. Every intelligence is biased, and things like inductive biases are not categorically distinct from moral biases, they actually lie on a continuum.
For this notion of agency, of course, we actually care about which exact opinions, tendencies, biases, and values AIs have: that’s the essence of alignment. But I suspect you didn’t have this meaning in mind.
“Could you please clarify in which sense you use the word “agency”?”—I guess I’m pretty confused by hearing you ask the question because I guess my whole point with this question was to clarify what is meant by “agency”.
It’s a bit like if I asked “What do we mean by subjective and objective?” and you asked “Could you please clarify ‘subjective’ and ‘objective’?” that would seem rather strange to me.
The first sense seems relevant to alignment in that the kinds of worries we might have and the kinds of things that would reassure us regarding these worries would seem very different between AutoGPT and ChatGPT, even though we can of course bootstrap an AutoGPT with ChatGPT. I guess the way that I see it “X directly poses threat Y” and “X can be bootstrapped into a system that poses threat Y” seem like distinct threats, even if we can sometimes collapse this distinction.
The second meaning of agency seems relevant as well. Regarding safety properties, there’s a big difference between a system that has just learned a few heuristics for power-seeking behavior in training and a system that can adapt on the fly to take advantage of any weaknesses in our security during deployment, even if it’s never done anything remotely like that before.
Could you please clarify in which sense you use the word “agency”?
One sense is the technicality of setting oneself problems and choosing things to pursue instead of sitting IDLE waiting for commands. For example, the difference between AutoGPT and ChatGPT. (Except, AutoGPT doesn’t choose for oneself the highest-level problems, but it could be engineered to do so trivially as well.)
I think that the presence or absence of this kind of agency is not relevant to technical alignment:
First, we also want to protect from AI misuse (by humans) and having “non-agentic” but extremely smart AI doesn’t solve the problem. If you try to extrapolate GPT capabilities, it’s obvious that superintelligent GPT-n (or other systems surrounding it, e.g., content filters, but without loss of generality we can consider it a single AI system) should be extremely well aligned with humanity so that it itself chooses to solve certain problems and refuse to solve others, and also choosing the specific way of solving this or that problem. Even though these problems were given to it by the users.
Second, even “non-agentic” systems like ChatGPT tend to create effectively agentic entities on cultural-techno-evolutionary timescales: see “Why Simulator AIs want to be Active Inference AIs”.
The second meaning of “agency” is synonymous with “resourcefulness”, “intelligence” (in some sense), the ability to overcome obstacles and not back down in the face of challenges. I don’t see how this meaning of “agency” is directly relevant to the alignment question.
The third possible meaning of “agency” is having some intrinsic opinion, volition, tendencies, values, or emotions. The opposite is being completely “neutral”, in some sense. I think complete neutrality just doesn’t physically exist. Every intelligence is biased, and things like inductive biases are not categorically distinct from moral biases, they actually lie on a continuum.
For this notion of agency, of course, we actually care about which exact opinions, tendencies, biases, and values AIs have: that’s the essence of alignment. But I suspect you didn’t have this meaning in mind.
“Could you please clarify in which sense you use the word “agency”?”—I guess I’m pretty confused by hearing you ask the question because I guess my whole point with this question was to clarify what is meant by “agency”.
It’s a bit like if I asked “What do we mean by subjective and objective?” and you asked “Could you please clarify ‘subjective’ and ‘objective’?” that would seem rather strange to me.
The first sense seems relevant to alignment in that the kinds of worries we might have and the kinds of things that would reassure us regarding these worries would seem very different between AutoGPT and ChatGPT, even though we can of course bootstrap an AutoGPT with ChatGPT. I guess the way that I see it “X directly poses threat Y” and “X can be bootstrapped into a system that poses threat Y” seem like distinct threats, even if we can sometimes collapse this distinction.
The second meaning of agency seems relevant as well. Regarding safety properties, there’s a big difference between a system that has just learned a few heuristics for power-seeking behavior in training and a system that can adapt on the fly to take advantage of any weaknesses in our security during deployment, even if it’s never done anything remotely like that before.