This creates a recursive loop such that each of them experiences what it is like to experience being them experiencing what it is like to be the other, on and on to whatever degree is desired by either of them.
Why should this be the case? When I encounter a potentially hostile piece of programming, I don’t run it on my main computer. I run it in a carefully isolated sandbox until I’ve extracted whatever data or value I need from that program. Then I shut down the sandbox. If the AI is superintelligent enough to scan human minds as its taking humans apart (and why should it do that?), what prevents it from creating a similar isolated environment to keep any errant human consciousnesses away from its vital paper-clip optimizing computational resources?
It’s not that odd. Ars Technica has a good article on why generative AIs have such a strong tendency to confabulate. The short answer is that, given a prompt (consisting of tokens, which are similar to, but not quite the same as words), GPT will come up with new tokens that are more or less likely to come after the given tokens in the prompt. This is subject to a temperature parameter, which dictates how “creative” GPT is allowed to be (i.e. allowing GPT to pick less probable next-tokens with some probability). The output token is added to the prompt, and the whole thing is then fed back into GPT in order to generate another new token.
In other words, GPT is incapable of “going backwards”, as a human might and editing its previous output in to correct inaccuracies or inconsistencies. Instead, what it has to do is take the previous output as a given, and try to come up with new tokens that are likely to be generated given the already generated incorrect tokens. This is how GPT ends up with confabulated citations. Given the prompt, GPT generates some tokens, representing an author, for example. It then tries to generate the most likely words associated with that author, and the rest of the prompt, which is presumably asking for citations. As it generates a title, it chooses a word that doesn’t exist in any existing article titles written by that author. But it doesn’t “know” that, and it has no way of going back and editing prior output in order to correct itself. Instead GPT presses on, generating more tokens that are deemed to be likely given the mixture of correct and incorrect tokens that it has generated.
Scott Alexander has a great post, about human psychology, which touches on a similar theme, called The Apologist and the Revolutionary. Using the terms of that post, a GPT is 100% apologist, 0% revolutionary. No matter how nonsensical its previous output, GPT, by its very design, must take that previous output as axiomatic, and generate new output based upon that. That is what leads to uncanny results when GPT is asked for specific facts.