AI risk decomposition based on agency or powerseeking or adversarial optimization or something
Epistemic status: confused.
Some vague, closely related ways to decompose AI risk into two kinds of risk:
Risk due to AI agency vs risk unrelated to agency
Risk due to AI goal-directedness vs risk unrelated to goal-directedness
Risk due to AI planning vs risk unrelated to planning
Risk due to AI consequentialism vs risk unrelated to consequentialism
Risk due to AI utility-maximization vs risk unrelated to utility-maximization
Risk due to AI powerseeking vs risk unrelated to powerseeking
Risk due to AI optimizing against you vs risk unrelated to adversarial optimization
The central reason to worry about powerseeking/whatever AI, I think, is that sufficiently (relatively) capable goal-directed systems instrumentally converge to disempowering you.
The central reason to worry about non-powerseeking/whatever AI, I think, is failure to generalize correctly from training—distribution shift, Goodhart, You get what you measure.
AI risk decomposition based on agency or powerseeking or adversarial optimization or something
Epistemic status: confused.
Some vague, closely related ways to decompose AI risk into two kinds of risk:
Risk due to AI agency vs risk unrelated to agency
Risk due to AI goal-directedness vs risk unrelated to goal-directedness
Risk due to AI planning vs risk unrelated to planning
Risk due to AI consequentialism vs risk unrelated to consequentialism
Risk due to AI utility-maximization vs risk unrelated to utility-maximization
Risk due to AI powerseeking vs risk unrelated to powerseeking
Risk due to AI optimizing against you vs risk unrelated to adversarial optimization
The central reason to worry about powerseeking/whatever AI, I think, is that sufficiently (relatively) capable goal-directed systems instrumentally converge to disempowering you.
The central reason to worry about non-powerseeking/whatever AI, I think, is failure to generalize correctly from training—distribution shift, Goodhart, You get what you measure.