(Chiming in late, sorry!) It sounds like you are basically hypothesizing here that there will be powerful alignment techniques such that a given AI ends up acting as intended by e.g. some corporation. Specifically, your comment seems to allude to two of the high-level techniques mentioned in https://www.cold-takes.com/high-level-hopes-for-ai-alignment/ (digital neuroscience and checks/balances). I just wanted to note that this hypothesis (a) is not a point of consensus and I don’t think we should take it as a given; (b) is outside the scope of this post, which is trying to take things one step at a time and simply say that AIs could defeat humanity if they were aimed toward that goal.
I don’t think in this case the crux/argument goes directly through “the powerful alignment techniques” type of reasoning you describe in the “hopes for alignment”.
The crux for your argument is the AIs - somehow— a. want, b. are willing to and c. are able to coordinate with each other.
Even assuming AIs “wanted to”, for your case to be realistic they would need to be willing to, and able to coordinate.
Given that, my question is, how is it possible AIs are able to trust each other and coordinate with each other?
My view here is that basically all proposed ways how AIs could coordinate and trust each other I’ve seen are dual use, and would also aid with oversight/alignment. To take an example from your post—e.g. by opening their own email accounts and emailing each other. Ok, in that case, can I just pretend to be an AI, and ask about the plans? Will the overseers see the mailboxes as well?
Not sure if what I’m pointing to is clear, so I’ll try another way.
There is something like “how objectively difficult is to create trust between AIs” and “how objectively difficult is alignment”. I don’t think these parameters of the world are independent, and I do think that stories which treat them as completely independent are often unrealistic. (Or, at least, implicitly assume there some things which may differentially easy to coordinate a coup relative to making it easy to make something aligned or transparent)
Note that this belief about correlation does not depend on specific beliefs about how easy are powerful alignment techniques.
(Chiming in late, sorry!) It sounds like you are basically hypothesizing here that there will be powerful alignment techniques such that a given AI ends up acting as intended by e.g. some corporation. Specifically, your comment seems to allude to two of the high-level techniques mentioned in https://www.cold-takes.com/high-level-hopes-for-ai-alignment/ (digital neuroscience and checks/balances). I just wanted to note that this hypothesis (a) is not a point of consensus and I don’t think we should take it as a given; (b) is outside the scope of this post, which is trying to take things one step at a time and simply say that AIs could defeat humanity if they were aimed toward that goal.
I don’t think in this case the crux/argument goes directly through “the powerful alignment techniques” type of reasoning you describe in the “hopes for alignment”.
The crux for your argument is the AIs - somehow—
a. want,
b. are willing to and
c. are able to coordinate with each other.
Even assuming AIs “wanted to”, for your case to be realistic they would need to be willing to, and able to coordinate.
Given that, my question is, how is it possible AIs are able to trust each other and coordinate with each other?
My view here is that basically all proposed ways how AIs could coordinate and trust each other I’ve seen are dual use, and would also aid with oversight/alignment. To take an example from your post—e.g. by opening their own email accounts and emailing each other. Ok, in that case, can I just pretend to be an AI, and ask about the plans? Will the overseers see the mailboxes as well?
Not sure if what I’m pointing to is clear, so I’ll try another way.
There is something like “how objectively difficult is to create trust between AIs” and “how objectively difficult is alignment”. I don’t think these parameters of the world are independent, and I do think that stories which treat them as completely independent are often unrealistic. (Or, at least, implicitly assume there some things which may differentially easy to coordinate a coup relative to making it easy to make something aligned or transparent)
Note that this belief about correlation does not depend on specific beliefs about how easy are powerful alignment techniques.