I was reading (listening) to this and I think I’ve got some good reasons to expect failed AI coups to happen.
In general we probably expect “Value is Fragile” and this will probably apply to AI goals too (and it will think this) this will mean a Consequentialist AI will expect that if there is a high chance of another AI taking over soon then all value in the universe (according to it’s definition of value) then even though there is a low probability of a particular coup working it will still want to try it because if it doesn’t succeed then almost all the value will be destroyed. So for example this would mean if there are 4 similarly situated AI labs then an AI at one of them will reason they only have a 25% chance of getting control of all value in the universe so as soon as it can come up with a coup attempt that it believes has a greater than around a 25% chance it will probably want to go for it (maybe this is more complex but I think the qualitative point stands)
Secondly because “Value is Fragile” not only will AI’s be worried about other labs AI’s they will probably also be pretty worried about the next iteration of themselves after an SGD update, obviously there will be some correlation in beliefs about what is valuable between a similarly weighted Neural Network, but I don’t think there’s much reason to believe that NN weights will have been optimised to make this consistent.
So I think in conclusion to the extent the doom scenario is a runaway consequentialist AI I think unless ease of coup attempts succeeding jumps massively from around 0% to around 100% for some reason, there will be good reasons to expect that we will see failed coup attempts first.
I was reading (listening) to this and I think I’ve got some good reasons to expect failed AI coups to happen.
In general we probably expect “Value is Fragile” and this will probably apply to AI goals too (and it will think this) this will mean a Consequentialist AI will expect that if there is a high chance of another AI taking over soon then all value in the universe (according to it’s definition of value) then even though there is a low probability of a particular coup working it will still want to try it because if it doesn’t succeed then almost all the value will be destroyed. So for example this would mean if there are 4 similarly situated AI labs then an AI at one of them will reason they only have a 25% chance of getting control of all value in the universe so as soon as it can come up with a coup attempt that it believes has a greater than around a 25% chance it will probably want to go for it (maybe this is more complex but I think the qualitative point stands)
Secondly because “Value is Fragile” not only will AI’s be worried about other labs AI’s they will probably also be pretty worried about the next iteration of themselves after an SGD update, obviously there will be some correlation in beliefs about what is valuable between a similarly weighted Neural Network, but I don’t think there’s much reason to believe that NN weights will have been optimised to make this consistent.
So I think in conclusion to the extent the doom scenario is a runaway consequentialist AI I think unless ease of coup attempts succeeding jumps massively from around 0% to around 100% for some reason, there will be good reasons to expect that we will see failed coup attempts first.