AI alignment via civilizational cognitive updates

(This started as a reply to @Tamsin Leake ’s reply in my post about why cyborgism maybe should be open. This post does not require you to read our interaction, though it lead to this, and I’m very grateful for Tamsin’s reply.)

In general, this is a counterargument against:

we should only share cyborg tools (software that lets AI help us think) among AI safety people, so that the big labs don’t get ahold of them, so we can save the world before they end it

The gist

My idea is that IF humanity doesn’t want to die, we can discover this by maximizing information sharing and “converging” our culture and discourse towards what is *actually* going on with AI and what systems are capable of and that society isn’t equipped to handle it, which will then cause humanity to resolve its dislike of the state of the world, by creating safe AI and/​or institutions for creating safe AI.

(or whatever the path may be, which we don’t know yet!!)

Tools that (in part via AI) help us think, help us share ideas and parse information and find information, will speed up memetic evolution.

Can’t solve a problem you can’t see.

Explore vs exploit (“don’t buttclench”)

Something feels wrong about AI safety people not wanting to discuss insights “because big bad tech will use it to build stronger AI”.
(It doesn’t jive with my personality at all because it feels very buttclenchy, so I’m aware that I’m biased against this way of viewing things, but I’ll keep writing anyway)

I wanna propose this framing:

how much you share moves you along a spectrum of, fast learning on one side vs “the enemy” getting bad information, on the other side.


I believe we still need a LOT of information, and therefor should err on “share more and learn more”.

If AI safety is a massive civilizational coordination problem, we need all the memetic-evolutionary pressure we can get. Technology that helps us think and communicate is the way to perform such updates.

List of thoughts related to memetic evolution and AI alignment

Civilizational change as memetic evolution

Basically updates happens along 2 axes: raising update speed and widening information bottleneck.
Population density increased both of those: (hunter/​gatherer → villages → cities, (tho specialization is also a factor here, but you can see specialization as a result of widening search space + widening information bottleneck by society listening to more people on the fringes)), the printing press increased both, social media increased both.

Social media performs memetic updates

I think the magnitude and speed of updates that happen purely via Twitter and Youtube (maybe TikTok too), and the effect of that, is really important to understand and would guess that if many years from now we looked back at the years 2000-2030 with a sophisticated understanding of memetics, they would be a central topic.

Wokeism as case study

There’s a way to view the insanity of wokeism as a runaway memetic phenomenon that developed because of social media, which is a novel and powerful technology for spreading and developing memes. (as @Connor Leahy put it somewhere, “mentally ill teenagers developing increasingly deranged memetic viruses and unleashing them upon the population”)

People say wokeism peaked in 2020, maybe its arc can serve as a case study for understanding memetics (there are historical examples too, like religions, Nazism, slavery and its abolishment, communism, many many trends and social movements I’m ignorant about..)

Connor Leahy’s emphasis

He often talks about AI safety in terms of “civilizational coordination” (and in his/​Conjecture’s recent creation The Compendium it’s emphasizes it even more), which makes me wanna “update like a Bayesian” in this direction, or think about it more seriously until I can refute it or extract insights.

Alignment discourse lacks updates

I’ve heard multiple very smart people/​good thinkers criticize @Eliezer Yudkowsky and this field in general as not having updated well on modern AI (despite having great insights years ago when nobody saw it coming).

This is more a “Bayesian update in this direction”, not object level, but anyway it does feel like alignment discourse is not talking about actual specific models and developments that are happening *right now*, and Anthropic is probably doing this more than anybody via mechanistic interpretability.

You can argue “but we’re worried about future AIs” and yes I agree, but I find it very suspicious that that argument excuses the lack of updates.
(there’s probably a logical fallacy or epistemological sin I’m committing here, but whatever)

Fix big lab incentives?

it’s often said that big lab CEOs and their armies of researchers are very well-intentioned people who simply misunderstand and (maybe due to personal flaws/​laziness/​ambitions of grandeur) underestimate the colossal forces acting upon them (profit incentive being one).

So maybe they even become your ally if the system in which they’re embedded is more aligned with your values, or maybe they’ll jump out if its unaligned-ness is more globally and more concretely understood. (via memetic/​cultural evolution)

(or as @Connor Leahy might put it, if I understand his ideas that he explains in this podcast, “if you can make the gods and forces that the big labs are controlled by, do your bidding”)

World will change fast

With short timelines (end of 2028, which is generally what I believe, but if it’s longer, this point is even stronger), the world will change *massively* via not-yet-world-ending-AI.

The value of adapting to changes (by sharing information and arguments and insights), and the value of a civilization that is able to adapt, increases proportionally to how much things are changing.

Sama testifying before congress and Dario talking openly on Dwarkesh’s podcast about 25% doom, updated discourse.

Meta-thoughts about the above list

I hope to expand and elaborate on those topics… and part of why I’m writing on LW is that you can read an article’s preview on-hover, which lets you effectively create a web of posts, and build ideas, this website is a powerful tool for memetic evolution.

I’m sure almost everything there is already covered by multiple people with more much depth and writing skill—if only the technology to find and reference such articles (which I bet actually already exists in multiple forms) was more widespread and well-known!!

Side note: GUI tools, externality of cognition

This topic really really fascinates me and is super personal to me, because I’ve been building the app that I’m currently working on for over a year and it’s very much been a tool to boost my cognition and has helped my mental health immeasurably.
(in very short, it’s a desktop app written in Tkinter, a glorified note taking system + code editor/​executor with many windows and tabs and other widget types, and you can talk to LMs and create hotkeys for arbitrary code written within the application), and even before that with other apps, pretty soon after I started learning to code ~4 years ago.

Simply the ability to copypaste a shitload of text into a chat window and get a summary (or any shape of breakdown or cognitive work of your choosing) is extremely valuable -- this article is a result of many cycles of copypaste to Claude -> edit -> repeat -- and seems to yield like 95+% of the benefits of AI despite literally hundreds of engineering work and thinking on my part to create better tools. (and god knows how many tens of thousands of hours from big labs’ engineers)

(the non-AI benefits are more about being able to organize my thoughts and feelings better, iterate on UI design, search across the app/​my files, and these changes do very much respond to engineering efforts and to thinking)

How and why and in what ways tools and AI help our cognition, what the “landscape of cognitive tasks and abilities” looks like, what AI does and doesn’t help with and why, why note-taking is so good, cognition being outsourced to our social interactions, etc. etc. -- I’m basically planting a flag around these topics. I hope to write about this more, I think it’s fascinating and that the ceiling for empowering us via externalized cognition + AI tooling is very very high.

(And I suspect, based on my experience with this app and older intuitions, that there is way more benefit in AI-less tooling than we know (in very short: think of the speed and quantity of processing our visual system performs on a video or image of a scene, compared to reading and parsing information, it’s like 100x at least, in both speed and bandwidth))

Part of why I haven’t written about it yet and instead am writing posts like this, is that maybe discussing such ideas will “raise p(doom)” by empowering the AI industry, somehow via second- and third-order effects. It’s also very frustrating that my ideas might be dumb and trivial, and that I can’t even discover this without writing and publishing.

Betting on the human spirit

Maximizing global memetic evolution is fundamentally a bet on the human spirit and on the power of a globally cooperating and evolving civilization—which is basically an expression of the human spirit.

Whereas “buttclenching”, ie, “we safety people will keep everything secret and create an aligned AI, ship it to big labs and save the world before they destroy it (or directly use the AI to stop them)”, is a bet on a small number of AI safety people, and on the brilliance of individual humans, as opposed to the larger system in which we are embedded.

(by “cooperation” I don’t mean “everyone agrees on a goal and then does it in unison”, I mean, “everyone is in some kind of communication/​bit-sharing/​mutual exploitation/​applying adaptation pressure, adversarial or otherwise”. Which was nicely pointed out by Connor in this segment of his Bankless podcast appearance (in short, he praised the podcaster for suing the US government [ something crypto related ] because that’s a mechanism of civilizational coordination, regardless of being object-level correct about his case))

Memetic evolution bad?

> But what if it’s actually a bad thing to allow humans to understand each other more frictionlessly and influence each other more rapidly and globally converge on things? What if we converge towards hell, and the only way to save us is an aligned ASI to stop all badness?

> Idk man we’re just fucked then?

(edit: followup-ish: launched more general exploration into memetics )