for AIs, more robust adversarial examples—especially ones that work on AIs trained on different datasets—do seem to look more “reasonable” to humans. The really obvious adversarial example of this kind in human is like, cults, or so—I don’t really have another, though I do have examples that are like, on the edge of the cult pattern. It’s not completely magic, it doesn’t work on everyone, and it does seem like a core component of why people fall to it is something like a relaxed “control plane” that doesn’t really try hard to avoid being crashed by it; combined with, it’s attacking through somewhat native behaviors. But I think OP’s story is a good presentation of this anyway, because the level of immunity you can reliably have to a really well optimized thing is likely going to be enough to maintain some sanity, but not enough to be zero affected by it.
like, ultimately, light causes neural spikes. neural spikes can do all sorts of stuff. the robust paths through the brain are probably not qualitatively unfamiliar but can be hit pretty dang hard if you’re good at it. and the behavior being described isn’t “do anything of choosing”—it seems to just be “crash your brain and go on to crash as many others as possible”, gene drive style. It doesn’t seem obvious that the humans in the story are doomed as a species, even—but it’s evolutionarily novel to encounter such a large jump in your adversary’s ability to find the vulnerabilities that currently crash you.
Hmm, perhaps the attackers would have been more effective if they were able to make, ehm, reproductively fit glitchers...
Oh, something notable here—if you’re not personally familiar with hypnosis, it might be harder to grok this. Hypnosis is totally a thing, my concise summary is it’s “meditation towards obedience”—meditation where you intentionally put yourself in “fast path from hearing to action”, ish. edit 3: never do hypnosis with someone you don’t seriously trust, ie someone you’ve known for a long time who has significant incentive to not hurt you. The received wisdom is that it can be safe, but it’s unclear if that’s true, and I’ve updated towards not playing with it it from this conversation.[1] original text, which was insufficiently cautious: imo it’s not too dangerous as long as you go into it with the intention to not fully yield control and have mental exception handlers, but doing that intention activation of your attention to not leave huge gaps in the control plane seems potentially insufficient if the adversary is able to mess with you hard enough. Like, I agree we’re a lot more adversarially robust than current AIs such that the attacks against us have to be more targeted to specific human vulnerabilities, but basically I just don’t buy it’s perfect, and probably the way it fails for really robust attacks is gonna look more like manipulating the earliest layers of vision to get a foothold.
[1] Also, like, my current view is that things like the news or random youtubers might be able to do hypnosis-esque things if you approach them sufficiently uncritically. not to mention people with bad intentions who you know personally who are specifically trying to manipulate you—those keep showing up around these parts, so someone who wants to do hypnosis IRL who you met recently should not be trusted—that’s a red flag.
for AIs, more robust adversarial examples—especially ones that work on AIs trained on different datasets—do seem to look more “reasonable” to humans.
Then I would expect they are also more objectively similar. In any case that finding is strong evidence against manipulative adversarial examples for humans—your argument is basically “there’s just this huge mess of neurons, surely somewhere in there is a way”, but if the same adversarial examples work on minds with very different architectures, then that’s clearly not why they exist. Instead, they have to be explained by some higher-level cognitive factors shared by ~anyone who gets good at interpreting a wide range of visual data.
The really obvious adversarial example of this kind in human is like, cults, or so
Cults use much stronger means than is implied by adversarial examples. For one, they can react to and reinforce your behaviour—is a screen with text promising you things for doing what it wants, with escalating impact and building a track record an adversarial example? No. Its potentially worrying, but not really distinct from generic powerseeking problems. The cult also controls a much larger fraction of your total sensory input over an extended time. Cult members spreading the cult also use tactics that require very little precision—there isn’t information transmitted to them on how to do this, beyond simple verbal instructions. Even if there are more precision-needing ways of manipulating individuals, its another thing entirely to manipulate them into repeating those high precision strategies that they couldn’t themselves execute correctly on purpose.
if you’re not personally familiar with hypnosis
I think I am a little bit. I don’t think that means what you think it does. Listening-to-action still requires comprehension of the commands, which is much lower bandwidth than vision, and its a structure thats specifically there to be controllable by others, so it’s not an indication that we are controllable to others in other bizzare ways. And you are deliberately not being so critical—you haven’t, actually, been circumvented, and there isn’t really a path to escalating power—just the fact youre willing to oey someone in a specific context. Hypnosis also ends on its own—the brain naturally tends back towards baseline, implanting a mechanism that keeps itself active indefinitely is high-precision.
your argument is basically “there’s just this huge mess of neurons, surely somewhere in there is a way”,
I suppose that is what I said interpreted as a deductive claim. I have more abductive/bayesian/hunch information than that, I’ve expressed some of it, but I’ve been realizing lately a lot of my intuitions are not via deductive reasoning, which can make them hard to verify or communicate. (and I’d guess that that’s a common problem, seems like the sort of thing science exists to solve.) I’m likely not well equipped to present justifiedly-convincing-to-highly-skeptical-careful-evaluator claims about this, just detailed sketches of hunches and how I got them.
Your points about the limits of hypnosis seem reasonable. I agree that the foothold would only occur if the receiver is being “paid-in-dopamine”-or-something hard enough to want to become more obedient. We do seem to me to see that presented in the story—the kid being concerningly fascinated by the glitchers right off the bat as soon as they’re presented. And for what it’s worth, I think this is an exaggerated version of a thing we actually see on social media sometimes, though I’m kind of bored of this topic and would rather not expand on that deeply.
can you expand on what you mean by that? are there any actions you’d suggest, on my part or others, based on this claim? (also, which of the urban dictionary definitions of “lh” do you mean? they have opposite valences.)
edit: added a bunch of warnings to my original comment. sorry for missing them in the first place.
Fair enough. Neither dill nor ziz would have been able to pull off their crazy stuff without some people letting themselves get hypnotized, so I think the added warnings are correct.
for AIs, more robust adversarial examples—especially ones that work on AIs trained on different datasets—do seem to look more “reasonable” to humans. The really obvious adversarial example of this kind in human is like, cults, or so—I don’t really have another, though I do have examples that are like, on the edge of the cult pattern. It’s not completely magic, it doesn’t work on everyone, and it does seem like a core component of why people fall to it is something like a relaxed “control plane” that doesn’t really try hard to avoid being crashed by it; combined with, it’s attacking through somewhat native behaviors. But I think OP’s story is a good presentation of this anyway, because the level of immunity you can reliably have to a really well optimized thing is likely going to be enough to maintain some sanity, but not enough to be zero affected by it.
like, ultimately, light causes neural spikes. neural spikes can do all sorts of stuff. the robust paths through the brain are probably not qualitatively unfamiliar but can be hit pretty dang hard if you’re good at it. and the behavior being described isn’t “do anything of choosing”—it seems to just be “crash your brain and go on to crash as many others as possible”, gene drive style. It doesn’t seem obvious that the humans in the story are doomed as a species, even—but it’s evolutionarily novel to encounter such a large jump in your adversary’s ability to find the vulnerabilities that currently crash you.
Hmm, perhaps the attackers would have been more effective if they were able to make, ehm, reproductively fit glitchers...
Oh, something notable here—if you’re not personally familiar with hypnosis, it might be harder to grok this. Hypnosis is totally a thing, my concise summary is it’s “meditation towards obedience”—meditation where you intentionally put yourself in “fast path from hearing to action”, ish. edit 3: never do hypnosis with someone you don’t seriously trust, ie someone you’ve known for a long time who has significant incentive to not hurt you. The received wisdom is that it can be safe, but it’s unclear if that’s true, and I’ve updated towards not playing with it it from this conversation.[1] original text, which was insufficiently cautious:
imo it’s not too dangerous as long as you go into it with the intention to not fully yield control and have mental exception handlers, but doing that intention activation of your attention to not leave huge gaps in the control plane seems potentially insufficient if the adversary is able to mess with you hard enough. Like, I agree we’re a lot more adversarially robust than current AIs such that the attacks against us have to be more targeted to specific human vulnerabilities, but basically I just don’t buy it’s perfect, and probably the way it fails for really robust attacks is gonna look more like manipulating the earliest layers of vision to get a foothold.[1] Also, like, my current view is that things like the news or random youtubers might be able to do hypnosis-esque things if you approach them sufficiently uncritically. not to mention people with bad intentions who you know personally who are specifically trying to manipulate you—those keep showing up around these parts, so someone who wants to do hypnosis IRL who you met recently should not be trusted—that’s a red flag.
Then I would expect they are also more objectively similar. In any case that finding is strong evidence against manipulative adversarial examples for humans—your argument is basically “there’s just this huge mess of neurons, surely somewhere in there is a way”, but if the same adversarial examples work on minds with very different architectures, then that’s clearly not why they exist. Instead, they have to be explained by some higher-level cognitive factors shared by ~anyone who gets good at interpreting a wide range of visual data.
Cults use much stronger means than is implied by adversarial examples. For one, they can react to and reinforce your behaviour—is a screen with text promising you things for doing what it wants, with escalating impact and building a track record an adversarial example? No. Its potentially worrying, but not really distinct from generic powerseeking problems. The cult also controls a much larger fraction of your total sensory input over an extended time. Cult members spreading the cult also use tactics that require very little precision—there isn’t information transmitted to them on how to do this, beyond simple verbal instructions. Even if there are more precision-needing ways of manipulating individuals, its another thing entirely to manipulate them into repeating those high precision strategies that they couldn’t themselves execute correctly on purpose.
I think I am a little bit. I don’t think that means what you think it does. Listening-to-action still requires comprehension of the commands, which is much lower bandwidth than vision, and its a structure thats specifically there to be controllable by others, so it’s not an indication that we are controllable to others in other bizzare ways. And you are deliberately not being so critical—you haven’t, actually, been circumvented, and there isn’t really a path to escalating power—just the fact youre willing to oey someone in a specific context. Hypnosis also ends on its own—the brain naturally tends back towards baseline, implanting a mechanism that keeps itself active indefinitely is high-precision.
I suppose that is what I said interpreted as a deductive claim. I have more abductive/bayesian/hunch information than that, I’ve expressed some of it, but I’ve been realizing lately a lot of my intuitions are not via deductive reasoning, which can make them hard to verify or communicate. (and I’d guess that that’s a common problem, seems like the sort of thing science exists to solve.) I’m likely not well equipped to present justifiedly-convincing-to-highly-skeptical-careful-evaluator claims about this, just detailed sketches of hunches and how I got them.
Your points about the limits of hypnosis seem reasonable. I agree that the foothold would only occur if the receiver is being “paid-in-dopamine”-or-something hard enough to want to become more obedient. We do seem to me to see that presented in the story—the kid being concerningly fascinated by the glitchers right off the bat as soon as they’re presented. And for what it’s worth, I think this is an exaggerated version of a thing we actually see on social media sometimes, though I’m kind of bored of this topic and would rather not expand on that deeply.
Ah, you’re a soft-glitcher. /lh
Edit: This is a joke.
can you expand on what you mean by that? are there any actions you’d suggest, on my part or others, based on this claim? (also, which of the urban dictionary definitions of “lh” do you mean? they have opposite valences.)
edit: added a bunch of warnings to my original comment. sorry for missing them in the first place.
I meant “light-hearted” and sorry, it was just a joke.
Fair enough. Neither dill nor ziz would have been able to pull off their crazy stuff without some people letting themselves get hypnotized, so I think the added warnings are correct.