In the simplest case, acausal control is running a trojan on your own computer. This generalizes to inventing your own trojan based on clues and guesses, without it passing through communication channels in a recognizable form, and releasing it inside your home without realizing it’s bad for you. A central example of acausal control is where malicious computation is a model of environment that an agent forms as a byproduct of working on understanding the environment. If the model is insufficiently secured, and is allowed to do nontrivial computation of its own, it could escape or hijack agent’s mind from the inside (possibly as a mesa-optimizer).
The problem is that with a superintelligent environment that doesn’t already filter your input for you in a way that makes it safe, the only responsible thing might be to go completely blind for a very long time. Humans don’t understand their own minds well enough to rule out vulnerabilities of this kind.
In the simplest case, acausal control is running a trojan on your own computer. This generalizes to inventing your own trojan based on clues and guesses, without it passing through communication channels in a recognizable form, and releasing it inside your home without realizing it’s bad for you. A central example of acausal control is where malicious computation is a model of environment that an agent forms as a byproduct of working on understanding the environment. If the model is insufficiently secured, and is allowed to do nontrivial computation of its own, it could escape or hijack agent’s mind from the inside (possibly as a mesa-optimizer).
Ah, I see what you mean now. Thanks
And I very much agree with-
I would also clarify that it’s more than “fault”: they are the only reliable source of responsibility for solving and preventing such an attack
The problem is that with a superintelligent environment that doesn’t already filter your input for you in a way that makes it safe, the only responsible thing might be to go completely blind for a very long time. Humans don’t understand their own minds well enough to rule out vulnerabilities of this kind.