I don’t think in this case the crux/argument goes directly through “the powerful alignment techniques” type of reasoning you describe in the “hopes for alignment”.
The crux for your argument is the AIs - somehow— a. want, b. are willing to and c. are able to coordinate with each other.
Even assuming AIs “wanted to”, for your case to be realistic they would need to be willing to, and able to coordinate.
Given that, my question is, how is it possible AIs are able to trust each other and coordinate with each other?
My view here is that basically all proposed ways how AIs could coordinate and trust each other I’ve seen are dual use, and would also aid with oversight/alignment. To take an example from your post—e.g. by opening their own email accounts and emailing each other. Ok, in that case, can I just pretend to be an AI, and ask about the plans? Will the overseers see the mailboxes as well?
Not sure if what I’m pointing to is clear, so I’ll try another way.
There is something like “how objectively difficult is to create trust between AIs” and “how objectively difficult is alignment”. I don’t think these parameters of the world are independent, and I do think that stories which treat them as completely independent are often unrealistic. (Or, at least, implicitly assume there some things which may differentially easy to coordinate a coup relative to making it easy to make something aligned or transparent)
Note that this belief about correlation does not depend on specific beliefs about how easy are powerful alignment techniques.
I don’t think in this case the crux/argument goes directly through “the powerful alignment techniques” type of reasoning you describe in the “hopes for alignment”.
The crux for your argument is the AIs - somehow—
a. want,
b. are willing to and
c. are able to coordinate with each other.
Even assuming AIs “wanted to”, for your case to be realistic they would need to be willing to, and able to coordinate.
Given that, my question is, how is it possible AIs are able to trust each other and coordinate with each other?
My view here is that basically all proposed ways how AIs could coordinate and trust each other I’ve seen are dual use, and would also aid with oversight/alignment. To take an example from your post—e.g. by opening their own email accounts and emailing each other. Ok, in that case, can I just pretend to be an AI, and ask about the plans? Will the overseers see the mailboxes as well?
Not sure if what I’m pointing to is clear, so I’ll try another way.
There is something like “how objectively difficult is to create trust between AIs” and “how objectively difficult is alignment”. I don’t think these parameters of the world are independent, and I do think that stories which treat them as completely independent are often unrealistic. (Or, at least, implicitly assume there some things which may differentially easy to coordinate a coup relative to making it easy to make something aligned or transparent)
Note that this belief about correlation does not depend on specific beliefs about how easy are powerful alignment techniques.