Bunthut comments on Trojan Sky

Bunthut Mar 13, 2025, 12:25 AM
1 point
−3
Ok, thats mostly what I’ve heard before. I’m skeptical because:
1. If something like classical adversarial examples existed for humans, it likely wouldn’t have the same effects on different people, or even just viewed from different angles, or maybe even in a different mood.
2. No known adversarial examples of the kind you describe for humans. We could tell if we had found them because we have metrics of “looking similar” which are not based on our intuitive sense of similarity, like pixelwise differences and convolutions. All examples of “easily confused” images I’ve seen were objectively similar to what theyre confused for.
3. Somewhat similar to what Grayson Chao said, it seems that the influence of vision on behaviour goes through a layer of “it looks like X”, which is much lower bandwidth than vision in total. Ads have qualitatively similar effects to what seeing their content actually happen in person would.
4. If adversarial examples exist, that doesn’t mean they exist for making you do anything of the manipulators choosing. Humans are, in principle, at least as programmable as a computer, but that also means there are vastly more courses of action than possible vision inputs. In practice, propably not a lot of high-cognitive-function-processing could be commandeered by adversarial inputs, and behaviours complex enough to glitch others couldn’t be implemented.
- the gears to ascension Mar 13, 2025, 12:32 AM
  3 points
  0
  Parent
  for AIs, more robust adversarial examples—especially ones that work on AIs trained on different datasets—do seem to look more “reasonable” to humans. The really obvious adversarial example of this kind in human is like, cults, or so—I don’t really have another, though I do have examples that are like, on the edge of the cult pattern. It’s not completely magic, it doesn’t work on everyone, and it does seem like a core component of why people fall to it is something like a relaxed “control plane” that doesn’t really try hard to avoid being crashed by it; combined with, it’s attacking through somewhat native behaviors. But I think OP’s story is a good presentation of this anyway, because the level of immunity you can reliably have to a really well optimized thing is likely going to be enough to maintain some sanity, but not enough to be zero affected by it.
  
  like, ultimately, light causes neural spikes. neural spikes can do all sorts of stuff. the robust paths through the brain are probably not qualitatively unfamiliar but can be hit pretty dang hard if you’re good at it. and the behavior being described isn’t “do anything of choosing”—it seems to just be “crash your brain and go on to crash as many others as possible”, gene drive style. It doesn’t seem obvious that the humans in the story are doomed as a species, even—but it’s evolutionarily novel to encounter such a large jump in your adversary’s ability to find the vulnerabilities that currently crash you.
  
  Hmm, perhaps the attackers would have been more effective if they were able to make, ehm, reproductively fit glitchers...
  
  Oh, something notable here—if you’re not personally familiar with hypnosis, it might be harder to grok this. Hypnosis is totally a thing, my concise summary is it’s “meditation towards obedience”—meditation where you intentionally put yourself in “fast path from hearing to action”, ish. edit 3: never do hypnosis with someone you don’t seriously trust, ie someone you’ve known for a long time who has significant incentive to not hurt you. The received wisdom is that it can be safe, but it’s unclear if that’s true, and I’ve updated towards not playing with it it from this conversation.[1] original text, which was insufficiently cautious: ~~imo it’s not too dangerous as long as you go into it with the intention to not fully yield control and have mental exception handlers~~, but doing that intention activation of your attention to not leave huge gaps in the control plane seems potentially insufficient if the adversary is able to mess with you hard enough. Like, I agree we’re a lot more adversarially robust than current AIs such that the attacks against us have to be more targeted to specific human vulnerabilities, but basically I just don’t buy it’s perfect, and probably the way it fails for really robust attacks is gonna look more like manipulating the earliest layers of vision to get a foothold.
  
  [1] Also, like, my current view is that things like the news or random youtubers might be able to do hypnosis-esque things if you approach them sufficiently uncritically. not to mention people with bad intentions who you know personally who are specifically trying to manipulate you—those keep showing up around these parts, so someone who wants to do hypnosis IRL who you met recently should not be trusted—that’s a red flag.
  - Bunthut Mar 13, 2025, 8:53 AM
    11 points
    0
    Parent
    for AIs, more robust adversarial examples—especially ones that work on AIs trained on different datasets—do seem to look more “reasonable” to humans.
    Then I would expect they are also more objectively similar. In any case that finding is strong evidence against manipulative adversarial examples for humans—your argument is basically “there’s just this huge mess of neurons, surely somewhere in there is a way”, but if the same adversarial examples work on minds with very different architectures, then that’s clearly not why they exist. Instead, they have to be explained by some higher-level cognitive factors shared by ~anyone who gets good at interpreting a wide range of visual data.
    The really obvious adversarial example of this kind in human is like, cults, or so
    Cults use much stronger means than is implied by adversarial examples. For one, they can react to and reinforce your behaviour—is a screen with text promising you things for doing what it wants, with escalating impact and building a track record an adversarial example? No. Its potentially worrying, but not really distinct from generic powerseeking problems. The cult also controls a much larger fraction of your total sensory input over an extended time. Cult members spreading the cult also use tactics that require very little precision—there isn’t information transmitted to them on how to do this, beyond simple verbal instructions. Even if there are more precision-needing ways of manipulating individuals, its another thing entirely to manipulate them into repeating those high precision strategies that they couldn’t themselves execute correctly on purpose.
    if you’re not personally familiar with hypnosis
    I think I am a little bit. I don’t think that means what you think it does. Listening-to-action still requires comprehension of the commands, which is much lower bandwidth than vision, and its a structure thats specifically there to be controllable by others, so it’s not an indication that we are controllable to others in other bizzare ways. And you are deliberately not being so critical—you haven’t, actually, been circumvented, and there isn’t really a path to escalating power—just the fact youre willing to oey someone in a specific context. Hypnosis also ends on its own—the brain naturally tends back towards baseline, implanting a mechanism that keeps itself active indefinitely is high-precision.
    - the gears to ascension Mar 13, 2025, 1:54 PM
      2 points
      0
      Parent
      
      your argument is basically “there’s just this huge mess of neurons, surely somewhere in there is a way”,
      
      I suppose that is what I said interpreted as a deductive claim. I have more abductive/bayesian/hunch information than that, I’ve expressed some of it, but I’ve been realizing lately a lot of my intuitions are not via deductive reasoning, which can make them hard to verify or communicate. (and I’d guess that that’s a common problem, seems like the sort of thing science exists to solve.) I’m likely not well equipped to present justifiedly-convincing-to-highly-skeptical-careful-evaluator claims about this, just detailed sketches of hunches and how I got them.
      
      Your points about the limits of hypnosis seem reasonable. I agree that the foothold would only occur if the receiver is being “paid-in-dopamine”-or-something hard enough to want to become more obedient. We do seem to me to see that presented in the story—the kid being concerningly fascinated by the glitchers right off the bat as soon as they’re presented. And for what it’s worth, I think this is an exaggerated version of a thing we actually see on social media sometimes, though I’m kind of bored of this topic and would rather not expand on that deeply.
  - green_leaf Mar 15, 2025, 9:01 AM
    4 points
    2
    Parent
    imo it’s not too dangerous as long as you go into it with the intention to not fully yield control and have mental exception handlers
    Ah, you’re a soft-glitcher. /lh
    Edit: This is a joke.
    - the gears to ascension Mar 15, 2025, 1:09 PM
      5 points
      0
      Parent
      can you expand on what you mean by that? are there any actions you’d suggest, on my part or others, based on this claim? (also, which of the urban dictionary definitions of “lh” do you mean? they have opposite valences.)
      
      edit: added a bunch of warnings to my original comment. sorry for missing them in the first place.
      - green_leaf Mar 16, 2025, 9:15 AM
        3 points
        0
        Parent
        I meant “light-hearted” and sorry, it was just a joke.
        the gears to ascension Mar 16, 2025, 11:27 AM
        3 points
        0
        Parent
        Fair enough. Neither dill nor ziz would have been able to pull off their crazy stuff without some people letting themselves get hypnotized, so I think the added warnings are correct.