Seems like a good problem to largely defer to AI though (especially if we’re assuming alignment in the instruction following sense), so maybe not the most pressing.
Unless there’s important factors about ‘order of operations’. By the time we have a powerful enough AI to solve this for us, it could be that someone is already defecting by using that AI to pursue recursive self-improvement at top speed…
I think that that is probably the case. I think we need to get the Council of Guardians in place and preventing defection before it’s too late, and irreversibly bad defection has already occurred.
I am unsure of exactly where the thresholds are, but I am confident that nobody else should be confident that there aren’t any risks! Our uncertainty should cause us to err on the side of putting in safe governance mechanisms ASAP!
Seems like a good problem to largely defer to AI though (especially if we’re assuming alignment in the instruction following sense), so maybe not the most pressing.
Unless there’s important factors about ‘order of operations’. By the time we have a powerful enough AI to solve this for us, it could be that someone is already defecting by using that AI to pursue recursive self-improvement at top speed…
I think that that is probably the case. I think we need to get the Council of Guardians in place and preventing defection before it’s too late, and irreversibly bad defection has already occurred.
I am unsure of exactly where the thresholds are, but I am confident that nobody else should be confident that there aren’t any risks! Our uncertainty should cause us to err on the side of putting in safe governance mechanisms ASAP!