Joe Collman comments on Towards more cooperative AI safety strategies

Joe Collman 21 Jul 2024 3:33 UTC
6 points
2
First, I think that thinking about and highlighting these kind of dynamics is important.
I expect that, by default, too few people will focus on analyzing such dynamics from a truth-seeking and/or instrumentally-useful-for-safety perspective.
That said:
- It seems to me you’re painting with too broad a brush throughout.
  - At the least, I think you should give some examples that lie just outside the boundary of what you’d want to call [structural power-seeking].
- Structural power-seeking in some sense seems unavoidable. (AI is increasingly powerful; influencing it implies power)
  - It’s not clear to me that you’re sticking to a consistent sense throughout.
    E.g. “That makes AI safety strategies which require power-seeking more difficult to carry out successfully.” seems false in general, unless you mean something fairly narrow by power-seeking.
- An important aspect is the (perceived) versatility of power:
  - To the extent that it’s [general power that could be efficiently applied to any goal], it’s suspicious.
  - To the extent that it’s [specialized power that’s only helpful in pursuing a narrow range of goals] it’s less suspicious.
- Similarly, it’s important under what circumstances the power would become general: if I take actions that can only give me power by routing through [develops principled alignment solution], that would make a stated goal of [develop principled alignment solution] believable; it doesn’t necessarily make some other goal believable—e.g. [...and we’ll use it to create this kind of utopia].
- Increasing legitimacy is power-seeking—unless it’s done in such a way that it implies constraints.
  - That said, you may be right that it’s somewhat less likely to be perceived as such.
  - Aiming for [people will tend to believe whatever I say about x] is textbook power-seeking wherever [influence on x] implies power.
  - We’d want something more like [people will tend to believe things that I say about x, so long as their generating process was subject to [constraints]].
    Here it’s preferable for [constraints] to be highly limiting and clear (all else equal).
- I’d say that “prioritizing competence” begs the question.
  - What is the required sense of “competence”?
    For the most important AI-based decision-making, I doubt that ”...broadly competent, and capable of responding sensibly...” is a high enough bar.
  - In particular, ”...because they don’t yet take AGI very seriously” is not the only reason people are making predictable mistakes.
  - ″...as AGI capabilities and risks become less speculative...”
    Again, this seems too coarse-grained:
    Some risks becoming (much) clearer does not entail all risks becoming (much) clearer.
    Understanding some risks well while remaining blind to others, does not clearly imply safer decision-making, since “responding sensibly” will tend to be judged based on [risks we’ve noticed].