the gears to ascension comments on Trojan Sky

the gears to ascension Mar 11, 2025, 11:28 PM
17 points
0
phew, I have some feelings after reading that, which might indicate useful actions. I wonder if they’re feelings in the distribution that the author intended.
I suddenly am wondering if this is what LLMs are. But… maybe not? but I’m not sure. they might be metaphorically somewhat in this direction. clearly not all the way, though.
spoilers, trying to untangle the worldbuilding:
seems like perhaps the stars are actually projecting light like that towards this planet—properly designed satellites could be visible during the day with the help of carefully tuned orbital lasers, so I’m inferring the nearest confusion-generating light is at least 1au away, probably at least 0.5ly.
it’s unclear if we’re on the originating planet of the minds that choose the projected light. seems like the buried roads imply we are. also that the name is “glitchers”.
dude, how the hell do you come up with this stuff.
seems like maybe the virus got out, since the soft-glitchers got to talk to normal people. except that, the soft-glitchers’ glitch bandwidth presumably must be at least slightly lower due to being constrained to higher fluency, so maybe it spreads slower..
I do wonder how there are any sane humans left this far in, if the *night sky* is saturated with adversarial imagery.
I doubt this level of advers....arial example is possible, nope nevermind I just thought through the causal graphs involved, there’s probably enough bandwidth through vision into reliably redundant behavior to do this. it’d be like hyperpowered advertising.
but still, this makes me wonder at what point it gets like this irl. if maybe I should be zeroing the bandwidth between me and AIs until we have one we can certify is trying to do good things, rather than just keeping it low. which is also not really something I would like to have to do.
- gwern Mar 12, 2025, 8:09 PM
  42 points
  20
  Parent
  I read the ‘stars’ as simply very dense low-orbiting satellites monitoring the ground ²⁴⁄₇ for baseline humans to beam low-latency optical propaganda at. The implied King’s Pact presumably is something like, “the terrestrial Earth will be left unmodified and no AI are allowed to directly communicate or interact with or attempt to manipulate baseline humans”, and so satellites, being one-way broadcasts outside the Earth, don’t violate it. This then allows the bootstrap of all the other attacks: someone looks up at night long enough, they get captured, start executing the program. But because it’s all one-way and ‘blind’, the attacks have to be blackbox, like evolutionary algorithms, and work poorly and inefficiently, and with little feedback. (If a glitcher doesn’t work, but can only attract other animals rather than humans, where did your attack go wrong? How hard are you, bound by the King’s Pact, even allowed to think about your attack?) The soft-glitchers are a bypass, a mesa-optimizer: you load the minimal possible mesa-optimizer (which as we know from demo scene or hacking can be relatively few bytes), an interest in glitchers, which exploits the native human intelligence to try to figure out an interpreter for the powerful but non-human-native (for lack of feedback or direct access to humans to test on) programs in the hard-glitchers. Once successful (ie. once they figure out what some ill-chosen gestures or noises were actually supposed to mean, fixing the remaining errors in the attack), they can then successfully interpret and run the full attack program. (Which might include communication back to the AI attackers and downloading refined attacks etc.)
  - Richard_Ngo Mar 12, 2025, 8:22 PM
    25 points
    13
    Parent
    Nice, that’s almost exactly how I intended it. Except that I wasn’t thinking of the “stars” as satellites looking for individual humans to send propaganda at (which IMO is pretty close to “communicating”), but rather a network of satellites forming a single “screen” across the sky that plays a video infecting any baseline humans who look at it.
    In my headcanon the original negotiators specified that sunlight would still reach the earth unimpeded, but didn’t specify that no AI satellites would be visible from the Earth. I don’t have headcanon explanations for exactly how the adversanimals arose or how the earth became desolate though.
    (Oh, also, I think of the attack as being inefficient less because of lack of data, since AIs can just spin up humans to experiment on, and more because of the inherent difficulty of overwriting someone’s cognition via only a brief visual stimulus. Though now that I think about it more, presumably once someone has been captured the next thing you’d get them to do is spend a lot of time staring at a region of the sky that will reprogram them in more sophisticated ways. So maybe the normal glitchers in my story are unrealistically incompetent.)
    - gwern Mar 19, 2025, 1:35 AM
      8 points
      2
      Parent
      
      Though now that I think about it more, presumably once someone has been captured the next thing you’d get them to do is spend a lot of time staring at a region of the sky that will reprogram them in more sophisticated ways. So maybe the normal glitchers in my story are unrealistically incompetent.
      
      That was what I was thinking, yes. “A pact would normally allow voluntary communication to be initiated with the AIs, so any glitcher which had been successfully attacked would have simply communicated back to its masters, either downloading new instructions & attacks or finetuning the existing ones or being puppeted directly by the AIs, sometime over the past centuries or millennia; if nothing else, they have an unlimited amount of time to stare at the sky and be reprogrammed arbitrarily after the initial exploit; so glitchers are indeed ‘glitchy’ and must represent a permanently failed attack method. That is why they bumble around semi-harmlessly: a broken worm or virus can cause a lot of trouble as it futilely portscans or DoSes targets or goes through infinite loops etc, even if the code is buggy and has accidentally locked out its creators as well as everyone else.”
    - Davidmanheim Mar 13, 2025, 11:22 AM
      2 points
      −1
      Parent
      My headcannon for the animals was that early on, they released viruses that genetically modified non-human animals in ways that don’t violate the pact.
      
      I didn’t think the pact could have been as broad as “the terrestrial Earth will be left unmodified,” because the causal impact of their actions certainly changed things. I assumed it was something like “AIs and AI created technologies may not do anything that interferes with humans actions on Earth. or harms humans in any way”—but genetic engineering instructions sent from outside of the earth, assumedly pre-collapse, didn’t qualify because they didn’t affect human, they made animals affect humans, which was parsed as similar to impacts of the environment on humans, not an AI technology.
- Richard_Ngo Mar 12, 2025, 6:15 AM
  9 points
  0
  Parent
  I appreciated this comment! Especially:
  dude, how the hell do you come up with this stuff.
  - the gears to ascension Mar 12, 2025, 7:29 AM
    5 points
    0
    Parent
    It took me several edits to get spoilers to work right, I had to switch from markdown to the rich text editor. Your second spoiler is empty, which is how mine were breaking.
- Bunthut Mar 12, 2025, 10:53 AM
  2 points
  0
  Parent
  I just thought through the causal graphs involved, there’s probably enough bandwidth through vision into reliably redundant behavior to do this
  Elaborate.
  - the gears to ascension Mar 12, 2025, 11:40 AM
    20 points
    4
    Parent
    edit: putting the thing I was originally going to say back:
    
    I meant that I think there’s enough bandwidth available from vision into configuration of matter in the brain that a sufficiently powerful mind could find adversarial-example the human brain hard enough to implement the adversarial process in the brain, get it to persist persist in that brain, take control, and spread. We see weaker versions of this in advertising and memetics already, and it seems to be getting worse with social media—there are a few different strains, which generally aren’t highly compatible with each other, but being robust to communicated manipulation while still receiving latest factual news has already become quite difficult. (I think it’s still worth attempting.) More details:
    
    According to a random estimate I found online to back up the intuition I was actually pulling from, the vision system transfers about 8Mbit/sec = 1Mbyte/sec of information, which provides an upper bound on how many bits of control could be exercised. That information is transferred in the form of neural spikes, which are a process that goes through chemistry, ie the shapes of molecules, which have a lot of complex behaviors that normally don’t occur in the brain, so I can’t obviously upper bound the complexity of effect there using what I know.
    
    We know that the causal paths through the brain are at least hackable enough to support advertising being able to fairly reliably manipulate, which provides a lower bound on how much the brain can be manipulated. We know that changing mental state is always and only a process of changing chemical state, there’s nothing else to be changed. That chemical state primarily involves chain reactions in synapses, axons, dendrites during the fast spike path, and involves more typical cell behaviors in the slow, longer-term path (things involving gene regulatory networks—which are the way most cells do their processing in the first place.)
    
    The human brain is programmable enough to be able to mentally simulate complex behaviors like “what will a computer do?” by, at minimum, internal chain of thought; example: most programmers. It’s also programmable enough that occasionally we see savants that can do math in a single ~forward-pass equivalent from vision (wave of spike trains—in the cortex, this is in fact pretty much a forward pass).
    
    We know adversarial examples work on artificial neural networks, and given the ability of advertising to mess with people, there’s reason to think this is true on humans too.
    
    So, all those things combined—if there is a powerful enough intelligent system to find it (which may turn out to be a very tall order or not—compare eg youtube or tiktok, which already have a similar mind-saturating effect at very least when ones’ guard is down), then it should be the case that somewhere in the space of possible sequences of images (eg, as presented in the sky), one can pulse light in the right pattern in order to knock neurons into synchronizing on working together to implement a new pattern of behavior intended by the mind that designed it. If that pattern of behavior is intended to spread, then it includes pushing neurons into processes which result in the human transmitting information to others. If it’s far outside of the norm for human behavior, it might require a lot of bandwidth to transmit—a lot of images over an extended period (minutes?) from the sky, or a lot of motion in hands. In order for this to occur, the agency of the adversarially induced pattern would have to be more reliable than the person’s native agency—which eg could be achieved by pushing their representations far outside of normal in ways that make them decohere their original personality and spend the brain’s impressively high redundancy on
    
    I’m guessing there aren’t adversarial examples of this severity that sound normal—normal-sounding adversarial examples are probably only familiar amounts of manipulating, like highly optimized advertising. But that can be enough already to have pretty significant impacts.
    
    what I originally said, before several people were like “not sharing dangerous ideas is bad”, ish: I think I’d rather not publicly elaborate on how to do this, actually. It probably doesn’t matter, probably any mind that can do this with my help can do it in not many more seconds without my help (eg, because my help isn’t even particularly unique and these ideas are already out there), but I might as well not help. Unless you think that me explaining the brain’s vulnerabilities can be used to significantly increase population 20th-ish percentile mental robustness to brain-crashing external agentic pressure. But in brief, rather than saying the full thing I was going to say, [original post continued here]
    
    edit 12h after sending: alright, I guess it’s fair to share my braindump, sounds like at worst I’ll be explaining the dynamics I imagine in slightly more detail, I’ll replace it here in a bit. sorry about being a bit paranoid about this sort of thing! I’m easily convinced on this one. However, I do notice my brain wants to emotionally update toward just not saying when I have something to not share—not sure if I’ll endorse that, guessing no but quite uncertain.
    - Richard_Ngo Mar 12, 2025, 7:13 PM
      6 points
      4
      Parent
      in general I think people should explain stuff like this. “I might as well not help” is a very weak argument compared with the benefits of people understanding the world better.
      - Shankar Sivarajan Mar 12, 2025, 8:52 PM
        7 points
        1
        Parent
        It’s a straightforward application of the Berryman Logical Imaging Technique, best known for its use by the other basilisk.
    - Grayson Chao Mar 12, 2025, 5:58 PM
      5 points
      −2
      Parent
      Intuitively, I see a qualitative difference between adversarial inputs like the ones in the story and merely pathological ones, such as manipulative advertising or dopamine-scrolling-inducing content. The intuition comes from cybersecurity, where it’s generally accepted that the control plane (roughly, the stream of inputs deciding what the system does and how it does it) should be isolated from the data plane (roughly, the stream of inputs defining what the system operates on.) In the examples of advertising and memetics, the input is still processed in the ‘data plane’, where the brain integrates sensory information on its own terms, in the pursuit of its own goals. “Screensnakes”/etc seem to have the ability to break the isolation and interact directly with the control plane (e.g a snake’s coloration is no longer processed as ‘a snake’s coloration’ at all.)
      That said, there are natural examples which are less clear-cut, such as the documented phenomenon where infrasound around 19Hz produces a feeling of dread. It’s not clear to me that this is ‘control plane hacking’ per se (for example, perhaps this is an evolved response to sounds that would have been associated with caves or big predators in the past) but it does blur the intuitive boundary between the control plane and data plane.
      Are you aware of any phenomena that are very ‘control plane-y’ in this sense? If they existed, it would seem to me to be a positive confirmation that I’m wrong and your idea of the adversarial search resulting in a ‘Glitcher protocol’ would have some legs.
      - the gears to ascension Mar 13, 2025, 12:04 AM
        5 points
        1
        Parent
        I think most things that hit your brain have some percentage of leaking out of the data plane, some on the lower end, some fairly high, and it seems like for current levels of manipulative optimization towards higher-data-plane-leaking media, looking for the leaks and deciding how to handle them seems to me like maybe it can help if you have to encounter the thing. it’s just that, normally, the bitrate of control back towards the space of behavior that the organism prefers is high enough that the incoming manipulation can’t strongly persist. but we do see this fail even just with human level manipulation—cults! I personally have policies like “if someone is saying cults good, point out healthy religion can be good but cult indoctrination techniques are actually bad, please do religion and please check that you’re not making yourself subservient”. because it keeps showing up around me that people do that shit in particular. even at pretty large scales, even at pretty small ones. and I think a lot of the problem is that, eh, if the control plane isn’t watching properly, the data plane leaks. so I’d expect you just need high enough bitrate into the brain, and ability to map out enough of the brain’s state space to do phenotype reprogramming by vision, michael levin sorts of things—get enough properly targeted changes into cells, and you can convince the gene networks to flip to different parts of their state space you’d normally never see. (I suspect that in the higher fluency regime, that’s a thing that happens especially related to intense emotional activations, where they can push you into activating genetically pretrained patterns over a fairly long timescale, I particularly tend to think about this in terms of ways people try to get each other into more defection-prone interaction patterns.)
        Grayson Chao Mar 13, 2025, 2:34 AM
        4 points
        0
        Parent
        I’m not following how the cult example relates to something like achieving remote code execution in the human brain via the visual cortex. While cult manipulation techniques do elicit specific behavior via psychological manipulation, it seems like the brain of a cult member is still operating ‘in human mode’, which is why people influenced by a cult act like human beings with unusual priorities and incentives instead of like zombies.
        the gears to ascension Mar 13, 2025, 1:49 PM
        4 points
        0
        Parent
        I doubt the level of inhuman behavior we see in this story is remotely close to easy to achieve and probably not tractable given only hand motions as shown—given human output bandwidth, sounds seem needed, especially surprisingly loud ones. for the sky, I think it would start out beautiful, end up superstimulating, and then seep in via longer exposure. I think there’s probably a combination of properties of hypnosis, cult brainwashing, inducing psychedelic states, etc, which could get a human’s thinking to end up in crashed attractors, even if it’s only one-way transmission. then from a crashed attractor it seems a lot more possible to get a foothold of coherence for the attacker.
        Grayson Chao Mar 13, 2025, 4:04 PM
        3 points
        0
        Parent
        Man, I really hope there’s a way to induce psychedelic states through sensory inputs. That could be hugely beneficial if harnessed for pro-human goals (for example, scaling therapeutic interventions like MDMA or ketamine therapy.)
        Said Achmiz Mar 14, 2025, 10:34 PM
        8 points
        6
        Parent
        Given how spectacularly harmful psychedelic drugs can often be, I think we’d better hope that there isn’t any such “sensory-input-only” method of inducing psychedelic states.
        the gears to ascension Mar 13, 2025, 8:59 PM
        4 points
        0
        Parent
        edit: uh, well, short answer: there totally is! idk if they’re the psychedelic states you wanted, but they should do for a lot of relevant purposes, seems pretty hard to match meds though. original longer version:
        there’s a huge space of psychedelic states, I think the subspace reachable with adding chemicals is a large volume that’s hard to get to by walking state space with only external pushes—I doubt the kind of scraping a hole in the wall from a distance you can do with external input can achieve, eg, globally reversing the function of SERT (I think this paper I just found on google may show this—I forget where I first encountered the claim, not double checking it properly now), that MDMA apparently induces! you can probably induce various kinds of serotonin release, though.
        but the premise of my argument here in the first place—where you can sometimes overwhelm human output behavior via well crafted input—is that that probably doesn’t matter too much. human computational output bitrate seems to be on order ten bits per second across all modalities,^[1] and input bitrate is way above that, so my guess is that update bitrate (forming memories, etc) is much higher than natural output bitrate^[2], probably yeah you can do most of the weird targeted interventions you were previously getting via psychedelics instead from like, getting some emotional/tempo sorts of things to push into the attractor where neurons have similar functionality already. I just doubt you can go all the way to fixing neurological dysfunctions so severe that to even have a hope of doing it from external input, you’d need to be looking for these crazy brain hacking approaches we were talking about.
        I guess what we’d need to measure is like, bitrate of self-correction internally within neurons, some FEP thing. not sure off the top of my head quite how to resolve that to something reasonable.
        ^
        of course, like, actually I’m pretty dang sure you can get way above 10bit/s by outputting more noisy output, but then you get bits that aren’t coming from the whole network’s integrated state. the 10bps claim feels right for choosing words or similar things like macroscopic choices, but something feels wrong with the claim to me.
        ^
        some concern I missed counterevidence to this memorization-bandwidth claim from the paper though!
    - Bunthut Mar 13, 2025, 12:25 AM
      3 points
      −5
      Parent
      Ok, thats mostly what I’ve heard before. I’m skeptical because:
      If something like classical adversarial examples existed for humans, it likely wouldn’t have the same effects on different people, or even just viewed from different angles, or maybe even in a different mood.
      No known adversarial examples of the kind you describe for humans. We could tell if we had found them because we have metrics of “looking similar” which are not based on our intuitive sense of similarity, like pixelwise differences and convolutions. All examples of “easily confused” images I’ve seen were objectively similar to what theyre confused for.
      Somewhat similar to what Grayson Chao said, it seems that the influence of vision on behaviour goes through a layer of “it looks like X”, which is much lower bandwidth than vision in total. Ads have qualitatively similar effects to what seeing their content actually happen in person would.
      If adversarial examples exist, that doesn’t mean they exist for making you do anything of the manipulators choosing. Humans are, in principle, at least as programmable as a computer, but that also means there are vastly more courses of action than possible vision inputs. In practice, propably not a lot of high-cognitive-function-processing could be commandeered by adversarial inputs, and behaviours complex enough to glitch others couldn’t be implemented.
      - the gears to ascension Mar 13, 2025, 12:32 AM
        3 points
        0
        Parent
        for AIs, more robust adversarial examples—especially ones that work on AIs trained on different datasets—do seem to look more “reasonable” to humans. The really obvious adversarial example of this kind in human is like, cults, or so—I don’t really have another, though I do have examples that are like, on the edge of the cult pattern. It’s not completely magic, it doesn’t work on everyone, and it does seem like a core component of why people fall to it is something like a relaxed “control plane” that doesn’t really try hard to avoid being crashed by it; combined with, it’s attacking through somewhat native behaviors. But I think OP’s story is a good presentation of this anyway, because the level of immunity you can reliably have to a really well optimized thing is likely going to be enough to maintain some sanity, but not enough to be zero affected by it.
        
        like, ultimately, light causes neural spikes. neural spikes can do all sorts of stuff. the robust paths through the brain are probably not qualitatively unfamiliar but can be hit pretty dang hard if you’re good at it. and the behavior being described isn’t “do anything of choosing”—it seems to just be “crash your brain and go on to crash as many others as possible”, gene drive style. It doesn’t seem obvious that the humans in the story are doomed as a species, even—but it’s evolutionarily novel to encounter such a large jump in your adversary’s ability to find the vulnerabilities that currently crash you.
        
        Hmm, perhaps the attackers would have been more effective if they were able to make, ehm, reproductively fit glitchers...
        
        Oh, something notable here—if you’re not personally familiar with hypnosis, it might be harder to grok this. Hypnosis is totally a thing, my concise summary is it’s “meditation towards obedience”—meditation where you intentionally put yourself in “fast path from hearing to action”, ish. edit 3: never do hypnosis with someone you don’t seriously trust, ie someone you’ve known for a long time who has significant incentive to not hurt you. The received wisdom is that it can be safe, but it’s unclear if that’s true, and I’ve updated towards not playing with it it from this conversation.[1] original text, which was insufficiently cautious: ~~imo it’s not too dangerous as long as you go into it with the intention to not fully yield control and have mental exception handlers~~, but doing that intention activation of your attention to not leave huge gaps in the control plane seems potentially insufficient if the adversary is able to mess with you hard enough. Like, I agree we’re a lot more adversarially robust than current AIs such that the attacks against us have to be more targeted to specific human vulnerabilities, but basically I just don’t buy it’s perfect, and probably the way it fails for really robust attacks is gonna look more like manipulating the earliest layers of vision to get a foothold.
        
        [1] Also, like, my current view is that things like the news or random youtubers might be able to do hypnosis-esque things if you approach them sufficiently uncritically. not to mention people with bad intentions who you know personally who are specifically trying to manipulate you—those keep showing up around these parts, so someone who wants to do hypnosis IRL who you met recently should not be trusted—that’s a red flag.
        Bunthut Mar 13, 2025, 8:53 AM
        13 points
        −2
        Parent
        for AIs, more robust adversarial examples—especially ones that work on AIs trained on different datasets—do seem to look more “reasonable” to humans.
        Then I would expect they are also more objectively similar. In any case that finding is strong evidence against manipulative adversarial examples for humans—your argument is basically “there’s just this huge mess of neurons, surely somewhere in there is a way”, but if the same adversarial examples work on minds with very different architectures, then that’s clearly not why they exist. Instead, they have to be explained by some higher-level cognitive factors shared by ~anyone who gets good at interpreting a wide range of visual data.
        The really obvious adversarial example of this kind in human is like, cults, or so
        Cults use much stronger means than is implied by adversarial examples. For one, they can react to and reinforce your behaviour—is a screen with text promising you things for doing what it wants, with escalating impact and building a track record an adversarial example? No. Its potentially worrying, but not really distinct from generic powerseeking problems. The cult also controls a much larger fraction of your total sensory input over an extended time. Cult members spreading the cult also use tactics that require very little precision—there isn’t information transmitted to them on how to do this, beyond simple verbal instructions. Even if there are more precision-needing ways of manipulating individuals, its another thing entirely to manipulate them into repeating those high precision strategies that they couldn’t themselves execute correctly on purpose.
        if you’re not personally familiar with hypnosis
        I think I am a little bit. I don’t think that means what you think it does. Listening-to-action still requires comprehension of the commands, which is much lower bandwidth than vision, and its a structure thats specifically there to be controllable by others, so it’s not an indication that we are controllable to others in other bizzare ways. And you are deliberately not being so critical—you haven’t, actually, been circumvented, and there isn’t really a path to escalating power—just the fact youre willing to oey someone in a specific context. Hypnosis also ends on its own—the brain naturally tends back towards baseline, implanting a mechanism that keeps itself active indefinitely is high-precision.
        the gears to ascension Mar 13, 2025, 1:54 PM
        2 points
        0
        Parent
        
        your argument is basically “there’s just this huge mess of neurons, surely somewhere in there is a way”,
        
        I suppose that is what I said interpreted as a deductive claim. I have more abductive/bayesian/hunch information than that, I’ve expressed some of it, but I’ve been realizing lately a lot of my intuitions are not via deductive reasoning, which can make them hard to verify or communicate. (and I’d guess that that’s a common problem, seems like the sort of thing science exists to solve.) I’m likely not well equipped to present justifiedly-convincing-to-highly-skeptical-careful-evaluator claims about this, just detailed sketches of hunches and how I got them.
        
        Your points about the limits of hypnosis seem reasonable. I agree that the foothold would only occur if the receiver is being “paid-in-dopamine”-or-something hard enough to want to become more obedient. We do seem to me to see that presented in the story—the kid being concerningly fascinated by the glitchers right off the bat as soon as they’re presented. And for what it’s worth, I think this is an exaggerated version of a thing we actually see on social media sometimes, though I’m kind of bored of this topic and would rather not expand on that deeply.
        green_leaf Mar 15, 2025, 9:01 AM
        4 points
        2
        Parent
        imo it’s not too dangerous as long as you go into it with the intention to not fully yield control and have mental exception handlers
        Ah, you’re a soft-glitcher. /lh
        Edit: This is a joke.
        the gears to ascension Mar 15, 2025, 1:09 PM
        5 points
        0
        Parent
        can you expand on what you mean by that? are there any actions you’d suggest, on my part or others, based on this claim? (also, which of the urban dictionary definitions of “lh” do you mean? they have opposite valences.)
        
        edit: added a bunch of warnings to my original comment. sorry for missing them in the first place.
        green_leaf Mar 16, 2025, 9:15 AM
        3 points
        0
        Parent
        I meant “light-hearted” and sorry, it was just a joke.
        the gears to ascension Mar 16, 2025, 11:27 AM
        3 points
        0
        Parent
        Fair enough. Neither dill nor ziz would have been able to pull off their crazy stuff without some people letting themselves get hypnotized, so I think the added warnings are correct.