But, is there a way to do this that another human can learn to use fairly easily? This stretches credulity somewhat.
This hypothesis literally makes no sense to me. Why would adversarial policies for humans be infeasible for humans to learn? Why would it further be so infeasible as to be incredulous?
In a sense optical and other sensory illusions are adversarial exploits of human perceptions and humans can learn how to present sensory illusions very readily. Professional stage magicians are experts at deceiving human perception and may even discover new exploits not publicly disclosed.
We have not systematically run adversarial optimisation over human cognition to find adversarial policies.
There are plenty of incentives for people to find adversarial policies against other people. And, sure, there are stage magicians, but I don’t know any stage magician who can do the equivalent of winning at Go by playing a weird sequence of moves that confuses the senses (without exploiting any degrees of freedom other than the actual moves). AFAIK there is no weaponized stage magic either, which would be very useful if it was that efficient. (Camouflage, maybe? It stills seems pretty different.) Of course, it is possible that we cannot find these adversarial policies only because we are not able to interface with a human brain as directly as the adversarial policy training algorithm. In other words, maybe if you could play a lot of games against a Go champion while resetting their memory after every game, you would find something eventually (even though they randomize their moves, so the games don’t repeat). But, I dunno.
Of course, it is possible that we cannot find these adversarial policies only because we are not able to interface with a human brain as directly as the adversarial policy training algorithm.
IMO this point is very underappreciated. It’s heavily load bearing that the adversarial policy could train itself against a (very) high fidelity simulation of the Go engine, do the ML equivalent of “reading its mind” while doing so, and train against successively stronger versions of the engine (more searches per turn) and for arbitrarily long.
We can’t do any of these vs a human. Even though there are incentives to find adversarial exploits for human cognition, we can’t systematically run an adversarial optimiser over a human mind the way we can over an ML mind.
And con artists/scammers, abusers, etc. may perform the equivalent of adversarial exploits on the cognition of particular people.
In other words, maybe if you could play a lot of games against a Go champion while resetting their memory after every game, you would find something eventually (even though they randomize their moves, so the games don’t repeat). But, I dunno.
Not me, but a (very) strong Go amateur might be able to learn and adopt a policy that a compute limited agent found to beat a Go champion given such a setup (notice that it wasn’t humans that discovered the adversarial policy even in the KataGo case).
I don’t think they do the ML equivalent of “reading its mind”? AFAIU, they are just training an RL agent to play against a “frozen” policy. Granted, we still can’t do that against a human.
This hypothesis literally makes no sense to me. Why would adversarial policies for humans be infeasible for humans to learn? Why would it further be so infeasible as to be incredulous?
In a sense optical and other sensory illusions are adversarial exploits of human perceptions and humans can learn how to present sensory illusions very readily. Professional stage magicians are experts at deceiving human perception and may even discover new exploits not publicly disclosed.
We have not systematically run adversarial optimisation over human cognition to find adversarial policies.
Your scepticism seems very unwarranted to me.
There are plenty of incentives for people to find adversarial policies against other people. And, sure, there are stage magicians, but I don’t know any stage magician who can do the equivalent of winning at Go by playing a weird sequence of moves that confuses the senses (without exploiting any degrees of freedom other than the actual moves). AFAIK there is no weaponized stage magic either, which would be very useful if it was that efficient. (Camouflage, maybe? It stills seems pretty different.) Of course, it is possible that we cannot find these adversarial policies only because we are not able to interface with a human brain as directly as the adversarial policy training algorithm. In other words, maybe if you could play a lot of games against a Go champion while resetting their memory after every game, you would find something eventually (even though they randomize their moves, so the games don’t repeat). But, I dunno.
IMO this point is very underappreciated. It’s heavily load bearing that the adversarial policy could train itself against a (very) high fidelity simulation of the Go engine, do the ML equivalent of “reading its mind” while doing so, and train against successively stronger versions of the engine (more searches per turn) and for arbitrarily long.
We can’t do any of these vs a human. Even though there are incentives to find adversarial exploits for human cognition, we can’t systematically run an adversarial optimiser over a human mind the way we can over an ML mind.
And con artists/scammers, abusers, etc. may perform the equivalent of adversarial exploits on the cognition of particular people.
Not me, but a (very) strong Go amateur might be able to learn and adopt a policy that a compute limited agent found to beat a Go champion given such a setup (notice that it wasn’t humans that discovered the adversarial policy even in the KataGo case).
I don’t think they do the ML equivalent of “reading its mind”? AFAIU, they are just training an RL agent to play against a “frozen” policy. Granted, we still can’t do that against a human.
Hmm, I think I nay have misunderstood/hallucinate the “reading its mind” analogy from an explained of the exploit I read elsewhere.