Chris_Leong comments on Why do we care about agency for alignment?

Chris_Leong 23 Apr 2023 21:01 UTC
2 points
0
“Could you please clarify in which sense you use the word “agency”?”—I guess I’m pretty confused by hearing you ask the question because I guess my whole point with this question was to clarify what is meant by “agency”.

It’s a bit like if I asked “What do we mean by subjective and objective?” and you asked “Could you please clarify ‘subjective’ and ‘objective’?” that would seem rather strange to me.
The first sense seems relevant to alignment in that the kinds of worries we might have and the kinds of things that would reassure us regarding these worries would seem very different between AutoGPT and ChatGPT, even though we can of course bootstrap an AutoGPT with ChatGPT. I guess the way that I see it “X directly poses threat Y” and “X can be bootstrapped into a system that poses threat Y” seem like distinct threats, even if we can sometimes collapse this distinction.
The second meaning of agency seems relevant as well. Regarding safety properties, there’s a big difference between a system that has just learned a few heuristics for power-seeking behavior in training and a system that can adapt on the fly to take advantage of any weaknesses in our security during deployment, even if it’s never done anything remotely like that before.