I don’t know if this is a well known argument that has already been responded to. If it is, just delete the post.
An implicit assumption most people make when discussion takeover risk, is that any misaligned agent will stave off enacting takeover plans with less than very high probability of success. If the Agent has an immediate takeover plan with 20% chance of success, but waiting a month would allow it to better its position to the point where the takeover plan it could enact then had a 99% chance of success, it would do so.
This assumption seems to break in multi-polar scenarios where you have many AI labs at similar levels of capabilities. As a toy example. If you have 20 AI labs, each having developed AIs at the level of intelligence that their takeover plans have ~20% success rate, and each of the AIs knows this, then being the first AI to enact their takeover plan radically improves the chances of your plan working at all. If you wait till all the 19 other AIs do their takeover plans, even if this would allow you to develop a perfect takeover plan with 100% success rate, 99% of those worlds will already have been taken over by other AIs. This is obviously much worse odds than just immediately enacting your shitty 20% plan as soon as you come up with it.
But it means we should expect to see AIs attempting takeovers, with humanity subsequently foiling them. Hopefully this leads to humans taking AI misalignment extremely seriously. I can’t say I’m confident humans will respond appropriately, like if “failed takeover” just looks like anomalous stuff happening on a server somewhere, people might not care much. But if it has some bells and whistles, maybe it would. Overall I think this changes AI takeover dynamic a little bit.
The non-obvious assumptions this argument hinges on is 1. AIs can’t coordinate. IE takeover is defecting, waiting and then takeover is cooperating. everyone doing takeover immediately is defecting. If they can find a way to get every AI to stave off takeover, it stops this dynamic. And we should expect the first takeover to be fatal. 2. Top AI labs are close in development, ie development doesn’t happen too quickly. If the first AI with takeover capabilities comes online a month before any other AI with similar capabilities, that would foil the argument as well 3. Top labs being open about exactly their state of development would be bad from this perspective. Secrecy about their level of development would be a good from this perspective.
[Question] Plausibility of Getting Early Warning Shots because AIs can’t coordinate?
I don’t know if this is a well known argument that has already been responded to. If it is, just delete the post.
An implicit assumption most people make when discussion takeover risk, is that any misaligned agent will stave off enacting takeover plans with less than very high probability of success. If the Agent has an immediate takeover plan with 20% chance of success, but waiting a month would allow it to better its position to the point where the takeover plan it could enact then had a 99% chance of success, it would do so.
This assumption seems to break in multi-polar scenarios where you have many AI labs at similar levels of capabilities. As a toy example. If you have 20 AI labs, each having developed AIs at the level of intelligence that their takeover plans have ~20% success rate, and each of the AIs knows this, then being the first AI to enact their takeover plan radically improves the chances of your plan working at all. If you wait till all the 19 other AIs do their takeover plans, even if this would allow you to develop a perfect takeover plan with 100% success rate, 99% of those worlds will already have been taken over by other AIs. This is obviously much worse odds than just immediately enacting your shitty 20% plan as soon as you come up with it.
But it means we should expect to see AIs attempting takeovers, with humanity subsequently foiling them. Hopefully this leads to humans taking AI misalignment extremely seriously. I can’t say I’m confident humans will respond appropriately, like if “failed takeover” just looks like anomalous stuff happening on a server somewhere, people might not care much. But if it has some bells and whistles, maybe it would. Overall I think this changes AI takeover dynamic a little bit.
The non-obvious assumptions this argument hinges on is
1. AIs can’t coordinate. IE takeover is defecting, waiting and then takeover is cooperating. everyone doing takeover immediately is defecting. If they can find a way to get every AI to stave off takeover, it stops this dynamic. And we should expect the first takeover to be fatal.
2. Top AI labs are close in development, ie development doesn’t happen too quickly. If the first AI with takeover capabilities comes online a month before any other AI with similar capabilities, that would foil the argument as well
3. Top labs being open about exactly their state of development would be bad from this perspective. Secrecy about their level of development would be a good from this perspective.