Assurance that you can only get hijacked through the attack surface rather than an unaccounted-for sidechannel doesn’t help very much.
I agree. I hope that my membranes proposal (post coming eventually) addresses this.
(BTW Mark Miller has a bunch of work in this vein along the lines of making secure computer systems)
Also, acausal control via defender’s own reasoning about environment from inside-the-membrane is immune to causal restrictions on how environment influences inside-of-the-membrane, though that would be more centrally defender’s own fault.
Would you rephrase this? This seems possibly quite interesting but I can’t tell what exactly you’re trying to say. (I think I’m confused about: “acausal control via defender’s own reasoning”)
In the simplest case, acausal control is running a trojan on your own computer. This generalizes to inventing your own trojan based on clues and guesses, without it passing through communication channels in a recognizable form, and releasing it inside your home without realizing it’s bad for you. A central example of acausal control is where malicious computation is a model of environment that an agent forms as a byproduct of working on understanding the environment. If the model is insufficiently secured, and is allowed to do nontrivial computation of its own, it could escape or hijack agent’s mind from the inside (possibly as a mesa-optimizer).
The problem is that with a superintelligent environment that doesn’t already filter your input for you in a way that makes it safe, the only responsible thing might be to go completely blind for a very long time. Humans don’t understand their own minds well enough to rule out vulnerabilities of this kind.
I agree. I hope that my membranes proposal (post coming eventually) addresses this.
(BTW Mark Miller has a bunch of work in this vein along the lines of making secure computer systems)
Would you rephrase this? This seems possibly quite interesting but I can’t tell what exactly you’re trying to say. (I think I’m confused about: “acausal control via defender’s own reasoning”)
I will respond to this part later
In the simplest case, acausal control is running a trojan on your own computer. This generalizes to inventing your own trojan based on clues and guesses, without it passing through communication channels in a recognizable form, and releasing it inside your home without realizing it’s bad for you. A central example of acausal control is where malicious computation is a model of environment that an agent forms as a byproduct of working on understanding the environment. If the model is insufficiently secured, and is allowed to do nontrivial computation of its own, it could escape or hijack agent’s mind from the inside (possibly as a mesa-optimizer).
Ah, I see what you mean now. Thanks
And I very much agree with-
I would also clarify that it’s more than “fault”: they are the only reliable source of responsibility for solving and preventing such an attack
The problem is that with a superintelligent environment that doesn’t already filter your input for you in a way that makes it safe, the only responsible thing might be to go completely blind for a very long time. Humans don’t understand their own minds well enough to rule out vulnerabilities of this kind.