This doesn’t work as a from-scratch explanation because
You don’t explain what alignment is or why it is desireable.
You don’t explain what agentising is. or why it is dangerous.
You don’t explain why “training successor versions of itself” is dangerous.
I agree that there are many situations where this cannot be used. But there appears at least to be a gap that arguments like this can fill that is missed by the existing explanations.
This doesn’t work as a from-scratch explanation because
You don’t explain what alignment is or why it is desireable.
You don’t explain what agentising is. or why it is dangerous.
You don’t explain why “training successor versions of itself” is dangerous.
I agree that there are many situations where this cannot be used. But there appears at least to be a gap that arguments like this can fill that is missed by the existing explanations.