a multipolar scenario can easily pressure an AGI to act before other AGIs with different values engage in similar attempts to conquer the world.
I think this is an important point; it could lead even AGIs which are not over confident to attempt to take over, as you note.
On the other hand, it’s possible that such AGIs would acausally collaborate on decision-theoretic grounds. As in, the act of collaboration would be to all not attempt to takeover, (unless humans were close to solving alignment). Then, the future AGI which takes over with correct near-certain belief in its ability to takeover acausally cooperates with the past AGIs by fulfilling their values post-takeover, too.
On reflection, this reason makes me think maybe the probability of an AGI attempting a takeover with low probability of success is equal to p(the decision theory underlying the above paragraph would not be used by AGIs) * p(we reach a situation where an AGI which could attempt a low-confidence takeover would believe that future AGIs will have substantially different values); with the caveat that if that doesn’t happen, there’s the still the later possibility of AGIs cooperatively taking over as humans near a solution to alignment.
I think it would be good if you’re right. I’m curious why you believe this. (Feel free to link other posts/comments discussing this, if there are any)
I think this is an important point; it could lead even AGIs which are not over confident to attempt to take over, as you note.
On the other hand, it’s possible that such AGIs would acausally collaborate on decision-theoretic grounds. As in, the act of collaboration would be to all not attempt to takeover, (unless humans were close to solving alignment). Then, the future AGI which takes over with correct near-certain belief in its ability to takeover acausally cooperates with the past AGIs by fulfilling their values post-takeover, too.
On reflection, this reason makes me think maybe the probability of an AGI attempting a takeover with low probability of success is equal to p(the decision theory underlying the above paragraph would not be used by AGIs) * p(we reach a situation where an AGI which could attempt a low-confidence takeover would believe that future AGIs will have substantially different values); with the caveat that if that doesn’t happen, there’s the still the later possibility of AGIs cooperatively taking over as humans near a solution to alignment.
I expect such acausal collaboration to be harder to develop than good calibration, and therefore less likely to happen at the stage I have in mind.
I think it would be good if you’re right. I’m curious why you believe this. (Feel free to link other posts/comments discussing this, if there are any)