I disagree with this. I think the most useful definition of alignment is intent alignment. Humans are effectively intent-aligned on the goal to not kill all of humanity. They may still kill all of humanity, but that is not an alignment problem but a problem in capabilities: humans aren’t capable of knowing which AI designs will be safe.
The same holds for intent-aligned AI systems that create unaligned successors.
I disagree with this. I think the most useful definition of alignment is intent alignment. Humans are effectively intent-aligned on the goal to not kill all of humanity. They may still kill all of humanity, but that is not an alignment problem but a problem in capabilities: humans aren’t capable of knowing which AI designs will be safe.
The same holds for intent-aligned AI systems that create unaligned successors.