Oliver Sourbut comments on “Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity

Oliver Sourbut 17 Dec 2023 10:44 UTC
4 points
2

But this is just as true of fully aligned agents! In fact the optimal plans of aligned and unaligned agents will probably converge for a while—they will take the same/similar initial steps (this is just a straightforward result of instrumental convergence to empowerment)

This is a minor fallacy—if you’re aligned, powerseeking can be suboptimal if it causes friction/conflict. Deception bites, obviously, making the difference less.