I think the missing piece is some sort of notion of “who is on what team”, which I think of as “agentic segmentation”. That is, can we draw borders in meatspace between who is being influenced by the rhetoric of which other agents. If a strong and misaligned superintelligence came to exist, my threat model is that it would do all sorts of dishonorable things by proxy using its rhetorical influence, and we would either see that something weird was going on with the agentic segmentation, or we wouldn’t. If we did see it, we could probably trace the agent segment back to some primary mover or locate the computational device that is powering the rapidly expanding agent segment.
I think the missing piece is some sort of notion of “who is on what team”, which I think of as “agentic segmentation”. That is, can we draw borders in meatspace between who is being influenced by the rhetoric of which other agents. If a strong and misaligned superintelligence came to exist, my threat model is that it would do all sorts of dishonorable things by proxy using its rhetorical influence, and we would either see that something weird was going on with the agentic segmentation, or we wouldn’t. If we did see it, we could probably trace the agent segment back to some primary mover or locate the computational device that is powering the rapidly expanding agent segment.