Extending this: trust problems could impede the flow of information in the first place in such a way that the introspective access stops being an amplifier across a system boundary. An AI can expose some code, but an AI that trusts other AIs to be exposing their code in a trustworthy fashion rather than choosing what code to show based on what will make the conversation partner do something they want seems like it’d be exploitable, and an AI that always exposes its code in a trustworthy fashion may also be exploitable.
Human societies do “creating enclaves of higher trust within a surrounding environment of lower trust” a lot, and it does improve coordination when it works right. I don’t know which way this would swing for super-coordination among AIs.
Extending this: trust problems could impede the flow of information in the first place in such a way that the introspective access stops being an amplifier across a system boundary. An AI can expose some code, but an AI that trusts other AIs to be exposing their code in a trustworthy fashion rather than choosing what code to show based on what will make the conversation partner do something they want seems like it’d be exploitable, and an AI that always exposes its code in a trustworthy fashion may also be exploitable.
Human societies do “creating enclaves of higher trust within a surrounding environment of lower trust” a lot, and it does improve coordination when it works right. I don’t know which way this would swing for super-coordination among AIs.