I’ve been thinking about similar international solutions, so I look forward to seeing your thoughts on the matter.
My major concern is sociopathic people gaining the reins of power of just one of those AGIs, and defecting against that council of guardians. I think sociopaths are greatly overrepresented among powerful people; they care less about the downsides of having and pursuing power aggressively.
That’s why I’d think even 20 RSI-capable human-directed AGIs wouldn’t be stable for more than a decade.
Yeah, I see it as sort of a temporary transitional mode for humanity. I also don’t think it would be stable for long. I might give it 20-30 years, but I would be skeptical about it holding for 50 years.
I do think that even 10 years more to work on more fundamental solutions to the AGI transition would be hugely valuable though!
I have been attempting at least to imagine how to design a system assuming that all the actors will be selfish and tempted to defect (and possibly sociopathic, as power-holders sometimes are or become), but prevented from breaking the system. Defection-resistant mechanisms, where you just need a majority of the council to not defect in a given ‘event’ in order for them to halt and punish the defectors. And that hopefully making it obvious that this was the case, and that defection would get noticed and punished, would prevent even sociopathic power-holders from defecting.
This seems possible to accomplish, if the system is designed such that catching and punishing an attempt at defection has benefits for the enforcers which give higher expected value in their minds than the option of deciding to also defect once they detected someone else defecting.
Seems like a good problem to largely defer to AI though (especially if we’re assuming alignment in the instruction following sense), so maybe not the most pressing.
Unless there’s important factors about ‘order of operations’. By the time we have a powerful enough AI to solve this for us, it could be that someone is already defecting by using that AI to pursue recursive self-improvement at top speed…
I think that that is probably the case. I think we need to get the Council of Guardians in place and preventing defection before it’s too late, and irreversibly bad defection has already occurred.
I am unsure of exactly where the thresholds are, but I am confident that nobody else should be confident that there aren’t any risks! Our uncertainty should cause us to err on the side of putting in safe governance mechanisms ASAP!
I’ve been thinking about similar international solutions, so I look forward to seeing your thoughts on the matter.
My major concern is sociopathic people gaining the reins of power of just one of those AGIs, and defecting against that council of guardians. I think sociopaths are greatly overrepresented among powerful people; they care less about the downsides of having and pursuing power aggressively.
That’s why I’d think even 20 RSI-capable human-directed AGIs wouldn’t be stable for more than a decade.
Yeah, I see it as sort of a temporary transitional mode for humanity. I also don’t think it would be stable for long. I might give it 20-30 years, but I would be skeptical about it holding for 50 years.
I do think that even 10 years more to work on more fundamental solutions to the AGI transition would be hugely valuable though!
I have been attempting at least to imagine how to design a system assuming that all the actors will be selfish and tempted to defect (and possibly sociopathic, as power-holders sometimes are or become), but prevented from breaking the system. Defection-resistant mechanisms, where you just need a majority of the council to not defect in a given ‘event’ in order for them to halt and punish the defectors. And that hopefully making it obvious that this was the case, and that defection would get noticed and punished, would prevent even sociopathic power-holders from defecting.
This seems possible to accomplish, if the system is designed such that catching and punishing an attempt at defection has benefits for the enforcers which give higher expected value in their minds than the option of deciding to also defect once they detected someone else defecting.
Seems like a good problem to largely defer to AI though (especially if we’re assuming alignment in the instruction following sense), so maybe not the most pressing.
Unless there’s important factors about ‘order of operations’. By the time we have a powerful enough AI to solve this for us, it could be that someone is already defecting by using that AI to pursue recursive self-improvement at top speed…
I think that that is probably the case. I think we need to get the Council of Guardians in place and preventing defection before it’s too late, and irreversibly bad defection has already occurred.
I am unsure of exactly where the thresholds are, but I am confident that nobody else should be confident that there aren’t any risks! Our uncertainty should cause us to err on the side of putting in safe governance mechanisms ASAP!