The first concern is absolutely critical, and one way to break the circularity issue is to rely on AI control, while another way is to place incentives that favor alignment as an equilibrium and make dishonesty/misalignment unfavorable, in the sense that you can’t have a continuously rewarding path to misalignment.
The second issue is less critical, assuming that AGI #21 hasn’t itself become deceptively aligned, because at that point, we can throw away #22 and restart from a fresh training run.
If that’s no longer an option, we can go to war against the misaligned AGI with our own AGI forces.
In particular, you can still do a whole lot of automated research once you break labor bottlenecks, and while this is a slowdown, this isn’t fatal, so we can work around it.
The third issue is if we have achieved aligned ASI, than we have at that point achieved our goal, and once humans are obsolete in making alignment advances, that’s when we can say the end goal has been achieved.
The first concern is absolutely critical, and one way to break the circularity issue is to rely on AI control, while another way is to place incentives that favor alignment as an equilibrium and make dishonesty/misalignment unfavorable, in the sense that you can’t have a continuously rewarding path to misalignment.
The second issue is less critical, assuming that AGI #21 hasn’t itself become deceptively aligned, because at that point, we can throw away #22 and restart from a fresh training run.
If that’s no longer an option, we can go to war against the misaligned AGI with our own AGI forces.
In particular, you can still do a whole lot of automated research once you break labor bottlenecks, and while this is a slowdown, this isn’t fatal, so we can work around it.
The third issue is if we have achieved aligned ASI, than we have at that point achieved our goal, and once humans are obsolete in making alignment advances, that’s when we can say the end goal has been achieved.