The internet is causing rapid memetic evolution towards ideas which stick in people’s minds, encourage them to take certain actions, especially ones that spread the idea. Ex: wokism, Communism, QAnon, etc
These memes push people who host them (all of us, to be clear) towards behaviors which are not in the best interests of humanity, because Orthogonality Thesis
The lack of will to work on AI risk comes from these memes’ general interference with clarity/agency, plus selective pressure to develop ways to get past “immune” systems which allow clarity/agency
Before you can work effectively on AI stuff, you have to clear out the misaligned memes stuck in your head. This can get you the clarity/agency necessary, and make sure that (if successful) you actually produce AGI aligned with “you”, not some meme
The global scale is too big for individuals—we need memes to coordinate us. This is why we shouldn’t try and just solve x-risk, we should focus on rationality, cultivating our internal meme garden, and favoring memes which will push the world in the direction we want it to go
Putting this in a separate comment, because Reign of Terror moderation scares me and I want to compartmentalize. I am still unclear about the following things:
Why do we think memetic evolution will produce complex/powerful results? It seems like the mutation rate is much, much higher than biological evolution.
Valentine describes these memes as superintelligences, as “noticing” things, and generally being agents. Are these superintelligences hosted per-instance-of-meme, with many stuffed into each human? Or is something like “QAnon” kind of a distributed intelligence, doing its “thinking” through social interactions? Both of these models seem to have some problems (power/speed), so maybe something else?
Misaligned (digital) AGI doesn’t seem like it’ll be a manifestation of some existing meme and therefore misaligned, it seems more like it’ll just be some new misaligned agent. There is no highly viral meme going around right now about producing tons of paperclips.
I really appreciate your list of claims and unclear points. Your succinct summary is helping me think about these ideas.
There is no highly viral meme going around right now about producing tons of paperclips.
A few examples came to mind: sports paraphernalia, tabletop miniatures, and stuffed animals (which likely outnumber real animals by hundreds or thousands of times).
One might argue that these things give humans joy, so they don’t count. There is some validity to that. AI paperclips are supposed to be useless to humans. On the other hand, one might also argue that it is unsurprising that subsystems repurposed to seek out paperclips derive some ‘enjoyment’ from the paperclips… but I don’t think that argument will hold water for these examples. Looking at it another way, some amount of paperclips are indeed useful.
No egregore has turned the entire world to paperclips just yet. But of course that hasn’t happened, else we would have already lost.
Even so: consider paperwork (like the tax forms mentioned in the post), skill certifications in the workplace, and things like slot machines and reality television. A lot of human effort is wasted on things humans don’t directly care about, for non-obvious reasons. Those things could be paperclips.
(And perhaps some humans derive genuine joy out of reality television, paperwork, or giant piles of paperclips. I don’t think that changes my point that there is evidence of egregores wasting resources.)
I think the point under contention isn’t whether current egregores are (in some sense) “optimizing” for things that would score poorly according to human values (they are), but whether the things they’re optimizing for have some (clear, substantive) relation to the things a misaligned AGI will end up optimizing for, such that an intervention on the whole egregores situation would have a substantial probability of impacting the eventual AGI.
To this question I think the answer is a fairly clear “no”, though of course this doesn’t invalidate the possibility that investigating how to deal with egregores may result in some non-trivial insights for the alignment problem.
Why do we think memetic evolution will produce complex/powerful results? It seems like the mutation rate is much, much higher than biological evolution.
Doesn’t the second part answer the first? I mean, the reason biological evolution matters is because its mutation rate massively outstrips geological and astronomical shifts. Memetic evolution dominates biological evolution for the same reason.
Also, just empirically: memetic evolution produced civilization, social movements, Crusades, the Nazis, etc.
I wonder if I’m just missing your question.
Are these superintelligences hosted per-instance-of-meme, with many stuffed into each human? Or is something like “QAnon” kind of a distributed intelligence, doing its “thinking” through social interactions?
Both.
I wonder if you’re both (a) blurring levels and (b) intuitively viewing these superintelligences as having some kind of essence that either is or isn’t in someone.
What is or isn’t a “meme” isn’t well defined. A catch phrase (e.g. “Black lives matter!”) is totally a meme. But is a religion a meme? Is it more like a collection of memes? If so, what exactly are its constituent memes? And with catch phrases, most of them can’t survive without a larger memetic context. (Try getting “Black lives matter!” to spread through an isolated Amazonian tribe.) So should we count the larger memetic context as part of the meme?
But if you stop trying to ask what is or isn’t a meme and you just look at the phenomenon, you can see something happening. In the BLM movement, the phrase “Silence is violence” evolved and spread because it was evocative and helped the whole movement combat opposition in a way that supported its egregoric possession.
So… where does the whole BLM superorganism live? In its believers and supporters, sure. But also in its opponents. (Think of how folk who opposed BLM would spread its claims in order to object to them.) Also on webpages. Billboards. Now in Hollywood movies. And it’s always shifting and mutating.
The academic field of memetics died because they couldn’t formally define “meme”. But that’s backwards. Biology didn’t need to formally define life to recognize that there’s something to study. The act of studying seems to make some definitions more possible.
That’s where we’re at right now. Egregoric zoology, post Darwin but pre Watson & Crick.
Misaligned (digital) AGI doesn’t seem like it’ll be a manifestation of some existing meme and therefore misaligned, it seems more like it’ll just be some new misaligned agent. There is no highly viral meme going around right now about producing tons of paperclips.
I quite agree. I didn’t mean to imply otherwise.
The thing is, unFriendly hypercreatures aren’t thinking about aligning AI to hypercreatures either. They have very little foresight.
(This is an artifact of how most unFriendly egregores do their thing via stupefaction. Most possessed people can’t think about the future because it’s too real and involves things like their personal death. They instead think about symbolic futures and get sideswiped when reality predictably doesn’t go according to their plans. So since unFriendly hypercreatures use stupefied minds to plan, they end up having trouble with long futures, ergo unable to sanely orient to real-world issues that in fact screw them over.)
I think these hypercreatures will get just as shocked as the rest of us when AGI comes online.
The thing is, the pathway by which something like AGI actually destroys us is some combo of (a) getting a hold of real-world systems like nukes and (b) hacking human minds to do its bidding. Both of these are already happening via unFriendly hypercreature evolution, and for exactly the same reasons that folk are fearing AI risk.
The creation of digital AGI just finishes moving the substrate off of humans, at which point the emergent unFriendly superintelligence no longer has any reason to care about human bodies or minds. At that point we lose all leverage.
That’s why I’m looking at the current situation and saying “Hey guys, I think you’re missing what’s actually happening here. We’re already in AI takeoff, and you’re fixated on the moment we lose all control instead of on this moment where we still have some.”
I think of the step to AGI as the final one, when some egregore figures out how to build a memetic nuke but doesn’t realize it’ll burn everything.
So, no magical meme transforming into a digital form.
(Although it’s some company or whatever that will specify to the AGI “Make paperclips” or whatever. God forbid some corporate egregore builds an AGI to “maximize profit”.)
Memetic evolution dominates biological evolution for the same reason.
Faster mutation rate doesn’t just produce faster evolution—it also reduces the steady-state fitness. Complex machinery can’t reliably be evolved if pieces of it are breaking all the time. I’m mostly relying No Evolutions for Corporations or Nanodevices plus one undergrad course in evolutionary bio here.
Also, just empirically: memetic evolution produced civilization, social movements, Crusades, the Nazis, etc.
Thank you for pointing this out. I agree with the empirical observation that we’ve had some very virulent and impactful memes. I’m skeptical about saying that those were produced by evolution rather than something more like genetic drift, because of the mutation-rate argument. But given that observation, I don’t know if it matters if there’s evolution going on or not. What we’re concerned with is the impact, not the mechanism.
I think at this point I’m mostly just objecting to the aesthetic and some less-rigorous claims that aren’t really important, not the core of what you’re arguing. Does it just come down to something like:
“Ideas can be highly infectious and strongly affect behavior. Before you do anything, check for ideas in your head which affect your behavior in ways you don’t like. And before you try and tackle a global-scale problem with a small-scale effort, see if you can get an idea out into the world to get help.”
I score this as “Good enough that I debated not bothering to correct anything.”
I think some corrections might be helpful though:
The internet is causing rapid memetic evolution…
While I think that’s true, that’s not really central to what I’m saying. I think these forces have been the main players for way, way longer than we’ve had an internet. The internet — like every other advance in communication — just increased evolutionary pressure at the memetic level by bringing more of these hypercreatures into contact with one another and with resources they could compete for.
These memes push people who host them (all of us, to be clear) towards behaviors which are not in the best interests of humanity, because Orthogonality Thesis
Yes. I’d just want to add that not all of them do. It’s just that the ones that tend to dominate tend to be unFriendly.
Two counterexamples:
Science. Not as an establishment, but as a kind of clarifying intelligence. This strikes me as a Friendly hypercreature. (The ossified practices of science, like “RCTs are the gold standard” and “Here’s the Scientific Method!”, tend to pull toward stupidity via Goodhart. A lot of LW is an attempt to reclaim the clarifying influence of this hypercreature’s intelligence.)
Jokes. These are sort of like innocuous memetic insects. As long as they don’t create problems for more powerful hypercreatures, they can undergo memetic evolution and spread. They aren’t particularly Friendly or unFriendly for the most part. Some of them add a little value via humor, although that’s not what they’re optimizing for. (The evolutionary pressure on jokes is “How effectively does hearing this joke cause the listener to faithfully repeat it?”). But if a joke were to somehow evolve into a more coherent behavior-controlling egregore, by default it’ll be an unFriendly one.
Before you can work effectively on AI stuff, you have to clear out the misaligned memes stuck in your head.
Almost. I think it’s more important that you have installed a system for noticing and weeding out these influences.
Like how John Vervaeke argues that the Buddha’s Eightfold Noble Path is a kind of virtual engine for creating relevant insight. The important part isn’t the insight but is instead the engine. Because the same processes that create insight also create delusion, so you need a systematic way of course-correcting.
This can get you the clarity/agency necessary, and make sure that (if successful) you actually produce AGI aligned with “you”, not some meme
No correction here. I just wanted to say, this is a delightfully clear way of saying what I meant.
…we shouldn’t try and just solve x-risk…
While I agree (both with the claim and with the fact that this is what I said), when I read you saying it I worry about an important nuance getting lost.
The emphasis here should be on “solve”, not “x-risk”. Solving xrisk is superhuman. So is xrisk itself for that matter. “God scale.”
However! Friendly hypercreatures need our minds in order to think. In order for a memetic strategy to result in solving AI risk, we need to understand the problem. We need to see its components clearly.
So I do think it helps to model xrisk. See its factors. See its Gears. See the landscape it’s embedded in.
Sort of like, a healthy marriage is more likely to emerge if both people make an effort to understand themselves, each other, and their dynamic within a context of togetherness and mutual care. But neither person is actually responsible for creating a healthy marriage. It sort of emerges organically from mutual open willingness plus compatibility.
…we should focus on rationality, cultivating our internal meme garden, and favoring memes which will push the world in the direction we want it to go
FWIW, this part sounds redundant to me. A “rationality” that is something like a magical completion of the Art would, as far as I can tell, consist almost entirely of consciously cultivating one’s internal memetic garden, which is nearly the same thing as favoring Friendly memes.
But after reading and replying to Scott’s comment, I’d adjust a little bit in the OP. For basically artistic reasons I mentioned “rationality for its own sake, period.” But I now think that’s distracting. What I’m actually in favor of is memetic literacy by whatever name. I think there’s an important art here whose absence causes people to focus on AI risk in unhelpful and often anti-helpful ways.
Also, on this part:
…which will push the world in the direction we want it to go
I want to emphasize that best as I can figure, we don’t have control over that. That’s more god-scale stuff. What each of us can do is notice what seems clarifying and kind to ourselves and to lean that way. I think there’s some delightful game theory that suggests that doing this supports Friendly hypercreatures.
These memes push people who host them (all of us, to be clear) towards behaviors which are not in the best interests of humanity, because Orthogonality Thesis
Im not entirely convinced. Memes are parasites, and thus, aim for equilibrium with its host. Hence why memeplexes that are truly evil and omnicidal never stick, memeplexes that are relatively evil peter out, and what we are left with are memeplexes that “kinda suck I guess” at worst. Succesful memeplex is one that ensures the host’s survival while forcing the host to spend maximum energy and resources spreading the memeplex without harming themselves too badly.
but the memeplexes can, at times, resist the growth of more accurate memeplexes which would ensure host survival better, because agency of the memetic networks and agency of the neural and genetic networks need not be aimed anywhere good, or even necessarily anywhere coherent in particular at times of high mutation. Notably, memeplexes that promote death and malice are more common in the presence of high rates of death and malice; death and malice are themselves self-propagating memetic diseases, in addition to whatever underlying mechanistic diseases might be causing them.
but the memeplexes can, at times, resist the growth of more accurate memeplexes which would ensure host survival better,
Of course, but IMHO they cannot do it for long, at least not on civilizational time scales. Memeplexes that ensure host survival better, and atop of that, empower the hosts, ultimately always win.
As of yet, we do not have any Deus Ex Machina to help the memeplexes exist without a host, or spread without the host being more powerful (physically, politically, socially, scientifically, technologically etc) than the hosts of other memeplexes. Over time, the memetic landscape tends to average out to begrudgingly positive and progressive, because memeplexes that fail to push the hosts forward are outcompeted.
One of the best examples of that is the memeplex of Far Right/Nazi/Fascist ideology, which, while memetically robust, tends to shoot itself in the foot and lose the memetic warfare without much coherent opposition from the liberal memeplexes. It resurfaces all the time, but never accomplishes much, because it is more host-detrimental than it is virulent. Meanwhile, memeplexes tht are kinda-sorta wishy-washy slightly Left of center, egalitarian-ish but not too much, vaguely pro-science and mildly technological, progressive-ish but unobtrusively, tend to always win, and had been winning since the times of Babylon. They struck the perfect balance between memetic frugality, virulence, and benefiting the hosts.
Yeah, I see we’re thinking on similar terms. I was in fact thinking specifically of the pattern of authoritarian, hyper-destructive memeplexes occasionally coming back up, growing fast, and then suddenly collapsing, repeatedly; sometimes doing huge amounts of damage when this occurs.
I don’t think we disagree, I was just expressing another rotation of what seems to already be your perspective.
I think there’s an important difference Valentine tries to make with respect to your fourth bullet (and if not, I will make). You perhaps describe the right idea, but the wrong shape. The problem is more like “China and the US both have incentives to bring about AGI and don’t have incentives towards safety.” Yes deflecting at the last second with some formula for safe AI will save you, but that’s as stupid as jumping away from a train at the last second. Move off the track hours ahead of time, and just broker a peace between countries to not make AGI.
Ah, so on this view, the endgame doesn’t look like
“make technical progress until the alignment tax is low enough that policy folks or other AI-risk-aware people in key positions will be able to get an unaware world to pay it”
But instead looks more like
“get the world to be aware enough to not bumble into an apocalypse, specifically by promoting rationality, which will let key decision-makers clear out the misaligned memes that keep them from seeing clearly”
Is that a fair summary? If so,I’m pretty skeptical of the proposed AI alignment strategy, even conditional on this strong memetic selection and orthogonality actually happening. It seems like this strategy requires pretty deeply influencing the worldview of many world leaders. That is obviously very difficult because no movement that I’m aware of has done it (at least, quickly), and I think they all would like to if they judged it doable. Importantly, the reduce-tax strategy requires clarifying and solving a complicated philosophical/technical problem, which is also very difficult. I think it’s more promising for the following reasons:
It has a stronger precedent (historical examples I’d reference include the invention of computability theory, the invention of information theory and cybernetics, and the adventures in logic leading up to Godel)
It’s more in line with rationalists’ general skill set, since the group is much more skewed towards analytical thinking and technical problem-solving than towards government/policy folks and being influential among those kinds of people
The number of people we would need to influence will go up as AGI tech becomes easier to develop, and every one is a single point of failure.
To be fair, these strategies are not in a strict either/or, and luckily use largely separate talent pools. But if the proposal here ultimately comes down to moving fungible resources towards the become-aware strategy and away from the technical-alignment strategy, I think I (mid-tentatively) disagree
It seems like this strategy requires pretty deeply influencing the worldview of many world leaders. That is obviously very difficult because no movement that I’m aware of has done it (at least, quickly), and I think they all would like to if they judged it doable.
It seems to me that in 2020 the world was changed relatively quickly. How many events in history was able to shift every mind on the planet within 3 months? If it only takes 3 months to occupy the majority of focus then you have a bounds for what a Super Intelligent Agent may plan for.
What is more concerning and also interesting is that such an intelligence can make something appear to be for X but it’s really planning for Y. So misdirection and ulterior motive is baked into this theory gaming. Unfortunately this can lead to a very schizophrenic inspection of every scenario as if strategically there is intention to trigger infinite regress on scrutiny.
When we’re dealing with these Hyperobjects/Avatars/Memes we can’t be certain that we understand the motive.
Given that we can’t understand the motive of any external meme, perhaps the only right path is to generate your own and propagate that solely?
A sketch of solution that doesn’t involve (traditional) world leaders could look like “Software engineers get together and agree that the field is super fucked, and start imposing stronger regulations and guidelines like traditional engineering disciplines use but on software.” This is a way of lowering the cost of alignment tax in the sense that, if software engineers all have a security mindset, or have to go through a security review, there is more process and knowledge related to potential problems and a way of executing a technical solution at the last moment. However, this description is itself is entirely political not technical, yet easily could not reach the awareness of world leaders or the general populace.
I have more hope than you here. I think we’re seeing Friendly memetic tech evolving that can change how influence comes about. The key tipping point isn’t “World leaders are influenced” but is instead “The Friendly memetic tech hatches a different way of being that can spread quickly.” And the plausible candidates I’ve seen often suggest it’ll spread superexponentially.
This is upstream of making the technical progress and right social maneuvers anyway. There’s insufficient collective will to do enough of the right kind of alignment research. Trying anyway mostly adds to the memetic dumpster fire we’re all in. So unless you have a bonkers once-in-an-aeon brilliant Messiah-level insight, you can’t do this first.
I think we’re seeing Friendly memetic tech evolving that can change how influence comes about.
Wait, literally evolving? How? Coincidence despite orthogonality? Did someone successfully set up an environment that selects for Friendly memes? Or is this not literally evolving, but more like “being developed”?
The key tipping point isn’t “World leaders are influenced” but is instead “The Friendly memetic tech hatches a different way of being that can spread quickly.” And the plausible candidates I’ve seen often suggest it’ll spread superexponentially.
Whoa! I would love to hear more about these plausible candidates.
There’s insufficient collective will to do enough of the right kind of alignment research.
I parse this second point as something like “alignment is hard enough that you need way more quality-adjusted research-years (QARY’s?) than the current track is capable of producing. This means that to have any reasonable shot at success, you basically have to launch a Much larger (but still aligned) movement via memetic tech, or just pray you’re the messiah and can singlehandedly provide all the research value of that mass movement.”. That seems plausible, and concerning, but highly sensitive to difficulty of alignment problem—which I personally have practically zero idea how to forecast.
My attempt to break down the key claims here:
The internet is causing rapid memetic evolution towards ideas which stick in people’s minds, encourage them to take certain actions, especially ones that spread the idea. Ex: wokism, Communism, QAnon, etc
These memes push people who host them (all of us, to be clear) towards behaviors which are not in the best interests of humanity, because Orthogonality Thesis
The lack of will to work on AI risk comes from these memes’ general interference with clarity/agency, plus selective pressure to develop ways to get past “immune” systems which allow clarity/agency
Before you can work effectively on AI stuff, you have to clear out the misaligned memes stuck in your head. This can get you the clarity/agency necessary, and make sure that (if successful) you actually produce AGI aligned with “you”, not some meme
The global scale is too big for individuals—we need memes to coordinate us. This is why we shouldn’t try and just solve x-risk, we should focus on rationality, cultivating our internal meme garden, and favoring memes which will push the world in the direction we want it to go
Putting this in a separate comment, because Reign of Terror moderation scares me and I want to compartmentalize. I am still unclear about the following things:
Why do we think memetic evolution will produce complex/powerful results? It seems like the mutation rate is much, much higher than biological evolution.
Valentine describes these memes as superintelligences, as “noticing” things, and generally being agents. Are these superintelligences hosted per-instance-of-meme, with many stuffed into each human? Or is something like “QAnon” kind of a distributed intelligence, doing its “thinking” through social interactions? Both of these models seem to have some problems (power/speed), so maybe something else?
Misaligned (digital) AGI doesn’t seem like it’ll be a manifestation of some existing meme and therefore misaligned, it seems more like it’ll just be some new misaligned agent. There is no highly viral meme going around right now about producing tons of paperclips.
I really appreciate your list of claims and unclear points. Your succinct summary is helping me think about these ideas.
A few examples came to mind: sports paraphernalia, tabletop miniatures, and stuffed animals (which likely outnumber real animals by hundreds or thousands of times).
One might argue that these things give humans joy, so they don’t count. There is some validity to that. AI paperclips are supposed to be useless to humans. On the other hand, one might also argue that it is unsurprising that subsystems repurposed to seek out paperclips derive some ‘enjoyment’ from the paperclips… but I don’t think that argument will hold water for these examples. Looking at it another way, some amount of paperclips are indeed useful.
No egregore has turned the entire world to paperclips just yet. But of course that hasn’t happened, else we would have already lost.
Even so: consider paperwork (like the tax forms mentioned in the post), skill certifications in the workplace, and things like slot machines and reality television. A lot of human effort is wasted on things humans don’t directly care about, for non-obvious reasons. Those things could be paperclips.
(And perhaps some humans derive genuine joy out of reality television, paperwork, or giant piles of paperclips. I don’t think that changes my point that there is evidence of egregores wasting resources.)
I think the point under contention isn’t whether current egregores are (in some sense) “optimizing” for things that would score poorly according to human values (they are), but whether the things they’re optimizing for have some (clear, substantive) relation to the things a misaligned AGI will end up optimizing for, such that an intervention on the whole egregores situation would have a substantial probability of impacting the eventual AGI.
To this question I think the answer is a fairly clear “no”, though of course this doesn’t invalidate the possibility that investigating how to deal with egregores may result in some non-trivial insights for the alignment problem.
I agree with you.
I also don’t think it matters whether the AGI will optimize for something current egregores care about.
What matters is whether current egregores will in fact create AGI.
The fear around AI risk is that the answer is “inevitably yes”.
The current egregores are actually no better at making AGI egregore-aligned than humans are at making it human-aligned.
But they’re a hell of a lot better at making AGI accidentally, and probably at all.
So if we don’t sort out how to align egregores, we’re fucked — and so are the egregores.
I think I see what you mean. A new AI won’t be under the control of egregores. It will be misaligned to them as well. That makes sense.
Doesn’t the second part answer the first? I mean, the reason biological evolution matters is because its mutation rate massively outstrips geological and astronomical shifts. Memetic evolution dominates biological evolution for the same reason.
Also, just empirically: memetic evolution produced civilization, social movements, Crusades, the Nazis, etc.
I wonder if I’m just missing your question.
Both.
I wonder if you’re both (a) blurring levels and (b) intuitively viewing these superintelligences as having some kind of essence that either is or isn’t in someone.
What is or isn’t a “meme” isn’t well defined. A catch phrase (e.g. “Black lives matter!”) is totally a meme. But is a religion a meme? Is it more like a collection of memes? If so, what exactly are its constituent memes? And with catch phrases, most of them can’t survive without a larger memetic context. (Try getting “Black lives matter!” to spread through an isolated Amazonian tribe.) So should we count the larger memetic context as part of the meme?
But if you stop trying to ask what is or isn’t a meme and you just look at the phenomenon, you can see something happening. In the BLM movement, the phrase “Silence is violence” evolved and spread because it was evocative and helped the whole movement combat opposition in a way that supported its egregoric possession.
So… where does the whole BLM superorganism live? In its believers and supporters, sure. But also in its opponents. (Think of how folk who opposed BLM would spread its claims in order to object to them.) Also on webpages. Billboards. Now in Hollywood movies. And it’s always shifting and mutating.
The academic field of memetics died because they couldn’t formally define “meme”. But that’s backwards. Biology didn’t need to formally define life to recognize that there’s something to study. The act of studying seems to make some definitions more possible.
That’s where we’re at right now. Egregoric zoology, post Darwin but pre Watson & Crick.
I quite agree. I didn’t mean to imply otherwise.
The thing is, unFriendly hypercreatures aren’t thinking about aligning AI to hypercreatures either. They have very little foresight.
(This is an artifact of how most unFriendly egregores do their thing via stupefaction. Most possessed people can’t think about the future because it’s too real and involves things like their personal death. They instead think about symbolic futures and get sideswiped when reality predictably doesn’t go according to their plans. So since unFriendly hypercreatures use stupefied minds to plan, they end up having trouble with long futures, ergo unable to sanely orient to real-world issues that in fact screw them over.)
I think these hypercreatures will get just as shocked as the rest of us when AGI comes online.
The thing is, the pathway by which something like AGI actually destroys us is some combo of (a) getting a hold of real-world systems like nukes and (b) hacking human minds to do its bidding. Both of these are already happening via unFriendly hypercreature evolution, and for exactly the same reasons that folk are fearing AI risk.
The creation of digital AGI just finishes moving the substrate off of humans, at which point the emergent unFriendly superintelligence no longer has any reason to care about human bodies or minds. At that point we lose all leverage.
That’s why I’m looking at the current situation and saying “Hey guys, I think you’re missing what’s actually happening here. We’re already in AI takeoff, and you’re fixated on the moment we lose all control instead of on this moment where we still have some.”
I think of the step to AGI as the final one, when some egregore figures out how to build a memetic nuke but doesn’t realize it’ll burn everything.
So, no magical meme transforming into a digital form.
(Although it’s some company or whatever that will specify to the AGI “Make paperclips” or whatever. God forbid some corporate egregore builds an AGI to “maximize profit”.)
Faster mutation rate doesn’t just produce faster evolution—it also reduces the steady-state fitness. Complex machinery can’t reliably be evolved if pieces of it are breaking all the time. I’m mostly relying No Evolutions for Corporations or Nanodevices plus one undergrad course in evolutionary bio here.
Thank you for pointing this out. I agree with the empirical observation that we’ve had some very virulent and impactful memes. I’m skeptical about saying that those were produced by evolution rather than something more like genetic drift, because of the mutation-rate argument. But given that observation, I don’t know if it matters if there’s evolution going on or not. What we’re concerned with is the impact, not the mechanism.
I think at this point I’m mostly just objecting to the aesthetic and some less-rigorous claims that aren’t really important, not the core of what you’re arguing. Does it just come down to something like:
“Ideas can be highly infectious and strongly affect behavior. Before you do anything, check for ideas in your head which affect your behavior in ways you don’t like. And before you try and tackle a global-scale problem with a small-scale effort, see if you can get an idea out into the world to get help.”
I like this, thank you.
I score this as “Good enough that I debated not bothering to correct anything.”
I think some corrections might be helpful though:
While I think that’s true, that’s not really central to what I’m saying. I think these forces have been the main players for way, way longer than we’ve had an internet. The internet — like every other advance in communication — just increased evolutionary pressure at the memetic level by bringing more of these hypercreatures into contact with one another and with resources they could compete for.
Yes. I’d just want to add that not all of them do. It’s just that the ones that tend to dominate tend to be unFriendly.
Two counterexamples:
Science. Not as an establishment, but as a kind of clarifying intelligence. This strikes me as a Friendly hypercreature. (The ossified practices of science, like “RCTs are the gold standard” and “Here’s the Scientific Method!”, tend to pull toward stupidity via Goodhart. A lot of LW is an attempt to reclaim the clarifying influence of this hypercreature’s intelligence.)
Jokes. These are sort of like innocuous memetic insects. As long as they don’t create problems for more powerful hypercreatures, they can undergo memetic evolution and spread. They aren’t particularly Friendly or unFriendly for the most part. Some of them add a little value via humor, although that’s not what they’re optimizing for. (The evolutionary pressure on jokes is “How effectively does hearing this joke cause the listener to faithfully repeat it?”). But if a joke were to somehow evolve into a more coherent behavior-controlling egregore, by default it’ll be an unFriendly one.
Almost. I think it’s more important that you have installed a system for noticing and weeding out these influences.
Like how John Vervaeke argues that the Buddha’s Eightfold Noble Path is a kind of virtual engine for creating relevant insight. The important part isn’t the insight but is instead the engine. Because the same processes that create insight also create delusion, so you need a systematic way of course-correcting.
No correction here. I just wanted to say, this is a delightfully clear way of saying what I meant.
While I agree (both with the claim and with the fact that this is what I said), when I read you saying it I worry about an important nuance getting lost.
The emphasis here should be on “solve”, not “x-risk”. Solving xrisk is superhuman. So is xrisk itself for that matter. “God scale.”
However! Friendly hypercreatures need our minds in order to think. In order for a memetic strategy to result in solving AI risk, we need to understand the problem. We need to see its components clearly.
So I do think it helps to model xrisk. See its factors. See its Gears. See the landscape it’s embedded in.
Sort of like, a healthy marriage is more likely to emerge if both people make an effort to understand themselves, each other, and their dynamic within a context of togetherness and mutual care. But neither person is actually responsible for creating a healthy marriage. It sort of emerges organically from mutual open willingness plus compatibility.
FWIW, this part sounds redundant to me. A “rationality” that is something like a magical completion of the Art would, as far as I can tell, consist almost entirely of consciously cultivating one’s internal memetic garden, which is nearly the same thing as favoring Friendly memes.
But after reading and replying to Scott’s comment, I’d adjust a little bit in the OP. For basically artistic reasons I mentioned “rationality for its own sake, period.” But I now think that’s distracting. What I’m actually in favor of is memetic literacy by whatever name. I think there’s an important art here whose absence causes people to focus on AI risk in unhelpful and often anti-helpful ways.
Also, on this part:
I want to emphasize that best as I can figure, we don’t have control over that. That’s more god-scale stuff. What each of us can do is notice what seems clarifying and kind to ourselves and to lean that way. I think there’s some delightful game theory that suggests that doing this supports Friendly hypercreatures.
And if not, I think we’re just fucked.
Im not entirely convinced. Memes are parasites, and thus, aim for equilibrium with its host. Hence why memeplexes that are truly evil and omnicidal never stick, memeplexes that are relatively evil peter out, and what we are left with are memeplexes that “kinda suck I guess” at worst. Succesful memeplex is one that ensures the host’s survival while forcing the host to spend maximum energy and resources spreading the memeplex without harming themselves too badly.
but the memeplexes can, at times, resist the growth of more accurate memeplexes which would ensure host survival better, because agency of the memetic networks and agency of the neural and genetic networks need not be aimed anywhere good, or even necessarily anywhere coherent in particular at times of high mutation. Notably, memeplexes that promote death and malice are more common in the presence of high rates of death and malice; death and malice are themselves self-propagating memetic diseases, in addition to whatever underlying mechanistic diseases might be causing them.
Of course, but IMHO they cannot do it for long, at least not on civilizational time scales. Memeplexes that ensure host survival better, and atop of that, empower the hosts, ultimately always win.
As of yet, we do not have any Deus Ex Machina to help the memeplexes exist without a host, or spread without the host being more powerful (physically, politically, socially, scientifically, technologically etc) than the hosts of other memeplexes. Over time, the memetic landscape tends to average out to begrudgingly positive and progressive, because memeplexes that fail to push the hosts forward are outcompeted.
One of the best examples of that is the memeplex of Far Right/Nazi/Fascist ideology, which, while memetically robust, tends to shoot itself in the foot and lose the memetic warfare without much coherent opposition from the liberal memeplexes. It resurfaces all the time, but never accomplishes much, because it is more host-detrimental than it is virulent. Meanwhile, memeplexes tht are kinda-sorta wishy-washy slightly Left of center, egalitarian-ish but not too much, vaguely pro-science and mildly technological, progressive-ish but unobtrusively, tend to always win, and had been winning since the times of Babylon. They struck the perfect balance between memetic frugality, virulence, and benefiting the hosts.
Yeah, I see we’re thinking on similar terms. I was in fact thinking specifically of the pattern of authoritarian, hyper-destructive memeplexes occasionally coming back up, growing fast, and then suddenly collapsing, repeatedly; sometimes doing huge amounts of damage when this occurs.
I don’t think we disagree, I was just expressing another rotation of what seems to already be your perspective.
I think there’s an important difference Valentine tries to make with respect to your fourth bullet (and if not, I will make). You perhaps describe the right idea, but the wrong shape. The problem is more like “China and the US both have incentives to bring about AGI and don’t have incentives towards safety.” Yes deflecting at the last second with some formula for safe AI will save you, but that’s as stupid as jumping away from a train at the last second. Move off the track hours ahead of time, and just broker a peace between countries to not make AGI.
Ah, so on this view, the endgame doesn’t look like
“make technical progress until the alignment tax is low enough that policy folks or other AI-risk-aware people in key positions will be able to get an unaware world to pay it”
But instead looks more like
“get the world to be aware enough to not bumble into an apocalypse, specifically by promoting rationality, which will let key decision-makers clear out the misaligned memes that keep them from seeing clearly”
Is that a fair summary? If so, I’m pretty skeptical of the proposed AI alignment strategy, even conditional on this strong memetic selection and orthogonality actually happening. It seems like this strategy requires pretty deeply influencing the worldview of many world leaders. That is obviously very difficult because no movement that I’m aware of has done it (at least, quickly), and I think they all would like to if they judged it doable. Importantly, the reduce-tax strategy requires clarifying and solving a complicated philosophical/technical problem, which is also very difficult. I think it’s more promising for the following reasons:
It has a stronger precedent (historical examples I’d reference include the invention of computability theory, the invention of information theory and cybernetics, and the adventures in logic leading up to Godel)
It’s more in line with rationalists’ general skill set, since the group is much more skewed towards analytical thinking and technical problem-solving than towards government/policy folks and being influential among those kinds of people
The number of people we would need to influence will go up as AGI tech becomes easier to develop, and every one is a single point of failure.
To be fair, these strategies are not in a strict either/or, and luckily use largely separate talent pools. But if the proposal here ultimately comes down to moving fungible resources towards the become-aware strategy and away from the technical-alignment strategy, I think I (mid-tentatively) disagree
It seems to me that in 2020 the world was changed relatively quickly. How many events in history was able to shift every mind on the planet within 3 months? If it only takes 3 months to occupy the majority of focus then you have a bounds for what a Super Intelligent Agent may plan for.
What is more concerning and also interesting is that such an intelligence can make something appear to be for X but it’s really planning for Y. So misdirection and ulterior motive is baked into this theory gaming. Unfortunately this can lead to a very schizophrenic inspection of every scenario as if strategically there is intention to trigger infinite regress on scrutiny.
When we’re dealing with these Hyperobjects/Avatars/Memes we can’t be certain that we understand the motive.
Given that we can’t understand the motive of any external meme, perhaps the only right path is to generate your own and propagate that solely?
A sketch of solution that doesn’t involve (traditional) world leaders could look like “Software engineers get together and agree that the field is super fucked, and start imposing stronger regulations and guidelines like traditional engineering disciplines use but on software.” This is a way of lowering the cost of alignment tax in the sense that, if software engineers all have a security mindset, or have to go through a security review, there is more process and knowledge related to potential problems and a way of executing a technical solution at the last moment. However, this description is itself is entirely political not technical, yet easily could not reach the awareness of world leaders or the general populace.
Two points:
I have more hope than you here. I think we’re seeing Friendly memetic tech evolving that can change how influence comes about. The key tipping point isn’t “World leaders are influenced” but is instead “The Friendly memetic tech hatches a different way of being that can spread quickly.” And the plausible candidates I’ve seen often suggest it’ll spread superexponentially.
This is upstream of making the technical progress and right social maneuvers anyway. There’s insufficient collective will to do enough of the right kind of alignment research. Trying anyway mostly adds to the memetic dumpster fire we’re all in. So unless you have a bonkers once-in-an-aeon brilliant Messiah-level insight, you can’t do this first.
Wait, literally evolving? How? Coincidence despite orthogonality? Did someone successfully set up an environment that selects for Friendly memes? Or is this not literally evolving, but more like “being developed”?
Whoa! I would love to hear more about these plausible candidates.
I parse this second point as something like “alignment is hard enough that you need way more quality-adjusted research-years (QARY’s?) than the current track is capable of producing. This means that to have any reasonable shot at success, you basically have to launch a Much larger (but still aligned) movement via memetic tech, or just pray you’re the messiah and can singlehandedly provide all the research value of that mass movement.”. That seems plausible, and concerning, but highly sensitive to difficulty of alignment problem—which I personally have practically zero idea how to forecast.