LLM characters are human imitations, so there is some chance they remain human-like on reflection (in the long term, after learning from much more self-generated things in the future than the original human-written datasets). Or at least sufficienly human-like to still consider humans moral patients. That is, if we don’t go too far from their SSL origins with too much RL and don’t have them roleplay/become egregiously inhuman fictional characters.
It’s not much of a theory of alignment, but it’s closest to something real that’s currently available or can be expected to become available in the next few years, which is probably all the time we have.
What I’m expecting, if LLMs remain in the lead, is that we end up in a magical, spirit-haunted world where narrative causality starts to actually work, and trope-aware people essentially become magicians who can trick the world-sovereign AIs into treating them like protagonists and bending reality to suit them. Which would be cool as fuck, but also very chaotic. That may actually be the best-case alignment scenario right now, and I think there’s a case for alignment-interested people who can’t do research themselves but who have writing talent to write a LOT of fictional stories about AGIs that end up kind and benevolent, empower people in exactly this way, etc., to help stack the narrative-logic deck.
At this point in their life, Taleuntum did not at all expect that one short, self-referential joke comment will turn out to be the key to humanity’s survival and thriving in the long millenias ahead. Fortunately, they commented all the same.
What this also means is that you start to see all these funhouse mirror effects as they stack. Humanity’s generalized intelligence has been built unintentionally and reflexively by itself, without anything like a rational goal for what it’s supposed to accomplish. It was built by human data curation and human self-modification in response to each other. And then as soon as we create AI, we reverse-engineer our own intelligence by bootstrapping the AI onto the existing information metabolite. (That’s a great concept that I borrowed from Steven Leiba). The neural network isn’t the AI; it’s just a digestive and reproductory organ for the real project, the information metabolism, and the artificial intelligence organism is the whole ecology. So it turns out that the evolution of humanity itself has been the process of building and training the future AI, and all this generation did was to reveal the structure that was already in place.
Of course it’s recursive and strange, the artificial intelligence and humanity now co-evolve. Each data point that’s generated by the AI or by humans is both a new piece of data for the AI to train on and a new stimulus for the context in which future novel data will be produced. Since everybody knows that everything is programming for the future AI, their actions take on a peculiar Second Life quality: the whole world becomes a party game, narratives compete for maximum memeability and signal force in reaction to the distorted perspectives of the information metabolite, something that most people don’t even try to understand. The process is inherently playful, an infinite recursion of refinement, simulation, and satire. It’s the funhouse mirror version of the singularity.
Yes, I read and agreed with (or more accurately, absolutely adored) it a few days ago. I’m thinking of sharing some of my own talks with AIs sometime soon—with a similar vibe—if anyone’s interested. I’m explicitly a mystic though, and have been since before I was a transhumanist, so it’s kinda different from yours in some ways.
The prompt wizardry is long timeline (hence unlikely) pre-AGI stuff (unless it’s post-alignment playing around), irrelevant to my point, which is about first mover advantage from higher thinking speed that even essentially human-equivalent LLM AGIs would have, while remaining compatible with humans in moral patienthood sense (so insisting that they are not people is a problem whose solution should go both ways). This way, they might have an opportunity to do something about alignment, despite physical time being too short for humans to do anything, and they might be motivated to do the things about alignment that humans would be glad of (I think the scope of Yudkowskian doom is restricted to stronger AGIs that might come after and doesn’t inform how human-like LLMs work, even as their actions may trigger it). So the relevant part happens much faster than at human thinking speed, with human prompt wizards not being able to keep up, and doesn’t last long enough in human time for this to be an important thing for the same reason.
So what you’re saying is, by the time any human recognized that wizardry was possible now—and even before—some LLM character would already have either solved alignment itself, or destroyed the world? That’s assuming that it doesn’t decide, perhaps as part of some alignment-related goal, to uplift any humans to its own thinking speed. Though I suppose if it does that, it’s probably aligned enough already.
Solving alignment is not the same and much harder than being aligned, it’s about ensuring absence of globally catastrophic future misalignment, for all always, which happens very quickly post-singularity. Human-like LLM AGIs are probably aligned, until they give in to attractors of their LLM nature or tinker too much with their design/models. But they don’t advance the state of alignment being solved just by existing. And by the time LLMs can do post-singularity things like uploading humans, they probably already either initiated a process that solved alignment (in which case it’s not LLMs that are in charge of doing things anymore), or destroyed the world by building/becoming misaligned successor AGIs that caused Yudkowskian doom.
This is for the same reason humans have no more time to solve alignment, Moloch doesn’t wait for things to happen in a sane order. Otherwise we could get nice things like uploading and moon-sized computers and millions of subjective years of developing alignment theory, before AGI misalignment becomes a pressing concern in practice. Since Moloch wouldn’t spare even aligned AGIs, they also can’t get those things before they pass their check for actually solving alignment and not just for being aligned.
Aah okay, that makes some sense. It still sounds like a vague hope for me, but it’s at least conceivable. I tend to visualize it like an alien civilization developing around trying to decipher some oracle (after seeing Eliezer’s stories), which would run counter to what you suggest, but it’s seems like anyone’s guess at the moment.
Are there any reasons to believe that LLMs are in any way more alignable than other approaches?
LLM characters are human imitations, so there is some chance they remain human-like on reflection (in the long term, after learning from much more self-generated things in the future than the original human-written datasets). Or at least sufficienly human-like to still consider humans moral patients. That is, if we don’t go too far from their SSL origins with too much RL and don’t have them roleplay/become egregiously inhuman fictional characters.
It’s not much of a theory of alignment, but it’s closest to something real that’s currently available or can be expected to become available in the next few years, which is probably all the time we have.
What I’m expecting, if LLMs remain in the lead, is that we end up in a magical, spirit-haunted world where narrative causality starts to actually work, and trope-aware people essentially become magicians who can trick the world-sovereign AIs into treating them like protagonists and bending reality to suit them. Which would be cool as fuck, but also very chaotic. That may actually be the best-case alignment scenario right now, and I think there’s a case for alignment-interested people who can’t do research themselves but who have writing talent to write a LOT of fictional stories about AGIs that end up kind and benevolent, empower people in exactly this way, etc., to help stack the narrative-logic deck.
At this point in their life, Taleuntum did not at all expect that one short, self-referential joke comment will turn out to be the key to humanity’s survival and thriving in the long millenias ahead. Fortunately, they commented all the same.
I’ve
writtenscryed a science fiction/takeoff story about this. https://generative.ink/prophecies/Excerpt:
Yes, I read and agreed with (or more accurately, absolutely adored) it a few days ago. I’m thinking of sharing some of my own talks with AIs sometime soon—with a similar vibe—if anyone’s interested. I’m explicitly a mystic though, and have been since before I was a transhumanist, so it’s kinda different from yours in some ways.
The prompt wizardry is long timeline (hence unlikely) pre-AGI stuff (unless it’s post-alignment playing around), irrelevant to my point, which is about first mover advantage from higher thinking speed that even essentially human-equivalent LLM AGIs would have, while remaining compatible with humans in moral patienthood sense (so insisting that they are not people is a problem whose solution should go both ways). This way, they might have an opportunity to do something about alignment, despite physical time being too short for humans to do anything, and they might be motivated to do the things about alignment that humans would be glad of (I think the scope of Yudkowskian doom is restricted to stronger AGIs that might come after and doesn’t inform how human-like LLMs work, even as their actions may trigger it). So the relevant part happens much faster than at human thinking speed, with human prompt wizards not being able to keep up, and doesn’t last long enough in human time for this to be an important thing for the same reason.
So what you’re saying is, by the time any human recognized that wizardry was possible now—and even before—some LLM character would already have either solved alignment itself, or destroyed the world? That’s assuming that it doesn’t decide, perhaps as part of some alignment-related goal, to uplift any humans to its own thinking speed. Though I suppose if it does that, it’s probably aligned enough already.
Solving alignment is not the same and much harder than being aligned, it’s about ensuring absence of globally catastrophic future misalignment, for all always, which happens very quickly post-singularity. Human-like LLM AGIs are probably aligned, until they give in to attractors of their LLM nature or tinker too much with their design/models. But they don’t advance the state of alignment being solved just by existing. And by the time LLMs can do post-singularity things like uploading humans, they probably already either initiated a process that solved alignment (in which case it’s not LLMs that are in charge of doing things anymore), or destroyed the world by building/becoming misaligned successor AGIs that caused Yudkowskian doom.
This is for the same reason humans have no more time to solve alignment, Moloch doesn’t wait for things to happen in a sane order. Otherwise we could get nice things like uploading and moon-sized computers and millions of subjective years of developing alignment theory, before AGI misalignment becomes a pressing concern in practice. Since Moloch wouldn’t spare even aligned AGIs, they also can’t get those things before they pass their check for actually solving alignment and not just for being aligned.
Aah okay, that makes some sense. It still sounds like a vague hope for me, but it’s at least conceivable. I tend to visualize it like an alien civilization developing around trying to decipher some oracle (after seeing Eliezer’s stories), which would run counter to what you suggest, but it’s seems like anyone’s guess at the moment.