I no longer think there’s anything we could learn about how to align an actually superhuman agi that we can’t learn from a weaker one.
(edit 2mo later: to rephrase—there exist model organisms of all possible misalignment issues that are weaker than superhuman. this is not to say human-like agi can teach us everything we need to know.)
I no longer think there’s anything we could learn about how to align an actually superhuman agi that we can’t learn from a weaker one.
(edit 2mo later: to rephrase—there exist model organisms of all possible misalignment issues that are weaker than superhuman. this is not to say human-like agi can teach us everything we need to know.)