My biggest concern with intent alignment of AGI is that we might run into the issue of AGI being used for something like a totalitarian control over everyone who doesn’t control AGI. It becomes a source of nearly unlimited power. The first company to create intent-aligned AGI (probably ASI at that point) can use it to stop all other attempts at building AGI. At that point, we’d have a handful of people wielding incredible power. It seems unlikely that they’d just decide to give it up. I think your “big if” is a really, really big if.
But other than that, your plan definitely seems workable. It avoids the problem of value drift, but unfortunately it incurs the cost dealing with power-hungry humans.
I can’t think of any ways to make a group of people that controls AGI release it from their control. But if I am missing something or if maybe there is some plan for this scenario, then I’d be really happy to learn about it.
My biggest concern with intent alignment of AGI is that we might run into the issue of AGI being used for something like a totalitarian control over everyone who doesn’t control AGI. It becomes a source of nearly unlimited power. The first company to create intent-aligned AGI (probably ASI at that point) can use it to stop all other attempts at building AGI. At that point, we’d have a handful of people wielding incredible power. It seems unlikely that they’d just decide to give it up. I think your “big if” is a really, really big if.
But other than that, your plan definitely seems workable. It avoids the problem of value drift, but unfortunately it incurs the cost dealing with power-hungry humans.
I can’t think of any ways to make a group of people that controls AGI release it from their control. But if I am missing something or if maybe there is some plan for this scenario, then I’d be really happy to learn about it.