It’s a bit tongue-in-cheek, but technically for an AI to be aligned, it isn’t allowed to create unaligned AIs. Like if your seed AI creates a paperclip maximizer, that’s bad.
So if humanity accidentally creates a paperclip maximizer, they are technically unaligned under this definition.
I disagree with this. I think the most useful definition of alignment is intent alignment. Humans are effectively intent-aligned on the goal to not kill all of humanity. They may still kill all of humanity, but that is not an alignment problem but a problem in capabilities: humans aren’t capable of knowing which AI designs will be safe.
The same holds for intent-aligned AI systems that create unaligned successors.
It’s a bit tongue-in-cheek, but technically for an AI to be aligned, it isn’t allowed to create unaligned AIs. Like if your seed AI creates a paperclip maximizer, that’s bad.
So if humanity accidentally creates a paperclip maximizer, they are technically unaligned under this definition.
I disagree with this. I think the most useful definition of alignment is intent alignment. Humans are effectively intent-aligned on the goal to not kill all of humanity. They may still kill all of humanity, but that is not an alignment problem but a problem in capabilities: humans aren’t capable of knowing which AI designs will be safe.
The same holds for intent-aligned AI systems that create unaligned successors.
Oooh gotcha. In that case, we are not remotely any good at avoiding the creation of unaligned humans either! ;)
Because we aren’t aligned.