But this is just as true of fully aligned agents! In fact the optimal plans of aligned and unaligned agents will probably converge for a while—they will take the same/similar initial steps (this is just a straightforward result of instrumental convergence to empowerment)
This is a minor fallacy—if you’re aligned, powerseeking can be suboptimal if it causes friction/conflict. Deception bites, obviously, making the difference less.
This is a minor fallacy—if you’re aligned, powerseeking can be suboptimal if it causes friction/conflict. Deception bites, obviously, making the difference less.