Thanks for the response. To be clear, when discussing mimics, I did not have in mind perfect uploads of people. Instead, they could indeed be rather limited imitations. For example, an AI designing improvements to itself doesn’t need to actually have a generally faithful imitation of human behavior. Instead, it could just know a few things, like, “make this algorithm score better on this thing without taking over the world”.
Still, I can see how, when it comes to especially limited imitations, iterated amplification could be valuable. This seems especially true if the imitations are unreliable in even narrow situations. It would be problematic is an AI tasked with designing powerful AI didn’t get the “act corrigibly, and don’t take over the world” part reliably right.
Thanks for the response. To be clear, when discussing mimics, I did not have in mind perfect uploads of people. Instead, they could indeed be rather limited imitations. For example, an AI designing improvements to itself doesn’t need to actually have a generally faithful imitation of human behavior. Instead, it could just know a few things, like, “make this algorithm score better on this thing without taking over the world”.
Still, I can see how, when it comes to especially limited imitations, iterated amplification could be valuable. This seems especially true if the imitations are unreliable in even narrow situations. It would be problematic is an AI tasked with designing powerful AI didn’t get the “act corrigibly, and don’t take over the world” part reliably right.