First, I think that thinking about and highlighting these kind of dynamics is important. I expect that, by default, too few people will focus on analyzing such dynamics from a truth-seeking and/or instrumentally-useful-for-safety perspective.
That said:
It seems to me you’re painting with too broad a brush throughout.
At the least, I think you should give some examples that lie just outside the boundary of what you’d want to call [structural power-seeking].
Structural power-seeking in some sense seems unavoidable. (AI is increasingly powerful; influencing it implies power)
It’s not clear to me that you’re sticking to a consistent sense throughout.
E.g. “That makes AI safety strategies which require power-seeking more difficult to carry out successfully.” seems false in general, unless you mean something fairly narrow by power-seeking.
An important aspect is the (perceived) versatility of power:
To the extent that it’s [general power that could be efficiently applied to any goal], it’s suspicious.
To the extent that it’s [specialized power that’s only helpful in pursuing a narrow range of goals] it’s less suspicious.
Similarly, it’s important under what circumstances the power would become general: if I take actions that can only give me power by routing through [develops principled alignment solution], that would make a stated goal of [develop principled alignment solution] believable; it doesn’t necessarily make some other goal believable—e.g. [...and we’ll use it to create this kind of utopia].
Increasing legitimacy is power-seeking—unless it’s done in such a way that it implies constraints.
That said, you may be right that it’s somewhat less likely to be perceived as such.
Aiming for [people will tend to believe whatever I say about x] is textbook power-seeking wherever [influence on x] implies power.
We’d want something more like [people will tend to believe things that I say about x, so long as their generating process was subject to [constraints]].
Here it’s preferable for [constraints] to be highly limiting and clear (all else equal).
I’d say that “prioritizing competence” begs the question.
What is the required sense of “competence”?
For the most important AI-based decision-making, I doubt that ”...broadly competent, and capable of responding sensibly...” is a high enough bar.
In particular, ”...because they don’t yet take AGI very seriously” is not the only reason people are making predictable mistakes.
″...as AGI capabilities and risks become less speculative...”
Again, this seems too coarse-grained:
Some risks becoming (much) clearer does not entail all risks becoming (much) clearer.
Understanding some risks well while remaining blind to others, does not clearly imply safer decision-making, since “responding sensibly” will tend to be judged based on [risks we’ve noticed].
First, I think that thinking about and highlighting these kind of dynamics is important.
I expect that, by default, too few people will focus on analyzing such dynamics from a truth-seeking and/or instrumentally-useful-for-safety perspective.
That said:
It seems to me you’re painting with too broad a brush throughout.
At the least, I think you should give some examples that lie just outside the boundary of what you’d want to call [structural power-seeking].
Structural power-seeking in some sense seems unavoidable. (AI is increasingly powerful; influencing it implies power)
It’s not clear to me that you’re sticking to a consistent sense throughout.
E.g. “That makes AI safety strategies which require power-seeking more difficult to carry out successfully.” seems false in general, unless you mean something fairly narrow by power-seeking.
An important aspect is the (perceived) versatility of power:
To the extent that it’s [general power that could be efficiently applied to any goal], it’s suspicious.
To the extent that it’s [specialized power that’s only helpful in pursuing a narrow range of goals] it’s less suspicious.
Similarly, it’s important under what circumstances the power would become general: if I take actions that can only give me power by routing through [develops principled alignment solution], that would make a stated goal of [develop principled alignment solution] believable; it doesn’t necessarily make some other goal believable—e.g. [...and we’ll use it to create this kind of utopia].
Increasing legitimacy is power-seeking—unless it’s done in such a way that it implies constraints.
That said, you may be right that it’s somewhat less likely to be perceived as such.
Aiming for [people will tend to believe whatever I say about x] is textbook power-seeking wherever [influence on x] implies power.
We’d want something more like [people will tend to believe things that I say about x, so long as their generating process was subject to [constraints]].
Here it’s preferable for [constraints] to be highly limiting and clear (all else equal).
I’d say that “prioritizing competence” begs the question.
What is the required sense of “competence”?
For the most important AI-based decision-making, I doubt that ”...broadly competent, and capable of responding sensibly...” is a high enough bar.
In particular, ”...because they don’t yet take AGI very seriously” is not the only reason people are making predictable mistakes.
″...as AGI capabilities and risks become less speculative...”
Again, this seems too coarse-grained:
Some risks becoming (much) clearer does not entail all risks becoming (much) clearer.
Understanding some risks well while remaining blind to others, does not clearly imply safer decision-making, since “responding sensibly” will tend to be judged based on [risks we’ve noticed].