I totally agree that the choice of “power seeking” is very unfortunate because of the same reasons you describe. I don’t think optionality is quite it, though. I think “consequentialist” or “goal seeking” might be better (or we could just stick with “instrumental convergence”—it at least has neutral affect).
As for underappreciatedness, I think this is possibly true, though anecdotally at least for me I already strongly believed this and in fact a large part of my generator of why I think alignment is difficult is based on this.
I think I disagree about leveraging this for alignment but I’ll read your proposal in more detail before commenting on that further.
I totally agree that the choice of “power seeking” is very unfortunate because of the same reasons you describe. I don’t think optionality is quite it, though. I think “consequentialist” or “goal seeking” might be better (or we could just stick with “instrumental convergence”—it at least has neutral affect).
As for underappreciatedness, I think this is possibly true, though anecdotally at least for me I already strongly believed this and in fact a large part of my generator of why I think alignment is difficult is based on this.
I think I disagree about leveraging this for alignment but I’ll read your proposal in more detail before commenting on that further.