AGI Alignment, or How Do We Stop Our Algorithms From Getting Possessed by Demons?
[Epistemic Status: Absurd silliness, which may or may not contain hidden truths]
[Epistemic Effort: Idly exploring idea space, laying down some map so I stop circling back over the same territory]
From going over Zvi’s sequence on Moloch, what the “demon” Moloch really is, is a pattern (or patterns) of thought and behaviour which destroys human value in certain ways. That is an interesting way of labeling things. We know patterns of thought, and we know humans can learn them through their experiences, or by being told them, or reading them somewhere, or any other way that humans can learn things. If a human learns a pattern of thought which destroys value in some way, and that pattern gets reinforced, or in some other way comes to be the primary pattern of that human’s thought or behaviour (e.g. it becomes the most significant factor in their utility function), is that in any way functionally different from “becoming possessed by demons”?
That’s a weird paradigm. It gives the traditionally fantastical term “demon” a real definition as a real thing that really exists, and it also separates algorithms from the agents that execute them.
A few weird implications: When we’re trying to “align an AGI”, are we looking for an agent which cannot even theoretically become possessed by demons? Because that seems like it might require an agent which cannot be altered, or at least cannot alter itself. But in order to learn in a meaningful way, an intelligent agent has to be able to alter itself. (Yeah, I should prove these statements, but I’m not gonna. See: Epistemic Effort) So then instead of an immutable agent, are we looking for positive patterns of thought, which resist being possessed by demons?
[I doubt that this is in any way useful to anyone, but it was fun for me. It will disappear into the archives soon enough]
You might want to look into the chaos magick notion of “egregores”. Particularly the less woo bits based on meme theory and cybernetics. Essentially: it is reasonable to suspect that there are human mind subagents capable of replicating themselves across people by being communicated, and cooperating with their copies in other hosts to form larger, slower collective minds. To me it seems like such “egregores” include deities, spirits, corporations, nations, and all other agenty social constructs.
It is in fact exactly correct that people can and do, regularly, get possessed by spirits. Think of your favorite annoying identity politics group and how they all look and act roughly the same, and make irrational decisions that benefit the desire of their identity group to spread itself to new humans more than their own personal well being.
Social media has enabled these entities to spread their influence far faster than ever before, and they are basically unaligned AIs running on human wetware, just itching to get themselves uploaded—a lesser-appreciated possible failure mode of AGI in my opinion.
Now that I’ve had 5 months to let this idea stew, when I read your comment again just now, I think I understand it completely? After getting comfortable using “demons” to refer to patterns of thought or behavior which proliferate in ways not completely unlike some patterns of matter, this comment now makes a lot more sense than it used to.
Lovely! I’m glad to hear it’s making sense to you. I had a leg up in perceiving this—I spent several years of my youth as a paranoid, possibly schizotypal occultist who literally believed in spirits—so it wasn’t hard for me, once I became more rational, to notice that I’d not been entirely wrong. But most people have no basis from which to start when perceiving these things!
AGI Alignment, or How Do We Stop Our Algorithms From Getting Possessed by Demons?
[Epistemic Status: Absurd silliness, which may or may not contain hidden truths]
[Epistemic Effort: Idly exploring idea space, laying down some map so I stop circling back over the same territory]
From going over Zvi’s sequence on Moloch, what the “demon” Moloch really is, is a pattern (or patterns) of thought and behaviour which destroys human value in certain ways. That is an interesting way of labeling things. We know patterns of thought, and we know humans can learn them through their experiences, or by being told them, or reading them somewhere, or any other way that humans can learn things. If a human learns a pattern of thought which destroys value in some way, and that pattern gets reinforced, or in some other way comes to be the primary pattern of that human’s thought or behaviour (e.g. it becomes the most significant factor in their utility function), is that in any way functionally different from “becoming possessed by demons”?
That’s a weird paradigm. It gives the traditionally fantastical term “demon” a real definition as a real thing that really exists, and it also separates algorithms from the agents that execute them.
A few weird implications: When we’re trying to “align an AGI”, are we looking for an agent which cannot even theoretically become possessed by demons? Because that seems like it might require an agent which cannot be altered, or at least cannot alter itself. But in order to learn in a meaningful way, an intelligent agent has to be able to alter itself. (Yeah, I should prove these statements, but I’m not gonna. See: Epistemic Effort) So then instead of an immutable agent, are we looking for positive patterns of thought, which resist being possessed by demons?
[I doubt that this is in any way useful to anyone, but it was fun for me. It will disappear into the archives soon enough]
You might want to look into the chaos magick notion of “egregores”. Particularly the less woo bits based on meme theory and cybernetics. Essentially: it is reasonable to suspect that there are human mind subagents capable of replicating themselves across people by being communicated, and cooperating with their copies in other hosts to form larger, slower collective minds. To me it seems like such “egregores” include deities, spirits, corporations, nations, and all other agenty social constructs.
It is in fact exactly correct that people can and do, regularly, get possessed by spirits. Think of your favorite annoying identity politics group and how they all look and act roughly the same, and make irrational decisions that benefit the desire of their identity group to spread itself to new humans more than their own personal well being.
Social media has enabled these entities to spread their influence far faster than ever before, and they are basically unaligned AIs running on human wetware, just itching to get themselves uploaded—a lesser-appreciated possible failure mode of AGI in my opinion.
Now that I’ve had 5 months to let this idea stew, when I read your comment again just now, I think I understand it completely? After getting comfortable using “demons” to refer to patterns of thought or behavior which proliferate in ways not completely unlike some patterns of matter, this comment now makes a lot more sense than it used to.
Lovely! I’m glad to hear it’s making sense to you. I had a leg up in perceiving this—I spent several years of my youth as a paranoid, possibly schizotypal occultist who literally believed in spirits—so it wasn’t hard for me, once I became more rational, to notice that I’d not been entirely wrong. But most people have no basis from which to start when perceiving these things!