In my last babble, I introduced the Babble and Prune model of thought generation: Babble with a weak heuristic to generate many more possibilities than necessary, Prune with a strong heuristic to find a best, or the satisfactory one. I want to zoom in on this model. If the last babble was colored by my biases as a probabilist, this one is motivated by my biases as a graph theorist.
First, I will speculate on the exact mechanism of Babble, and also highlight the fact Babble and Prune are independent systems that can be mocked out for unit testing.
Second, I will lather on some metaphors about the adversarial nature of Babble and Prune. Two people have independently mentioned Generative Adversarial Networks to me, a model of unsupervised learning involving two neural nets, Generator and Discriminator. The Artist and the Critic are archetypes of the same flavor—I have argued in the past the spirit of the Critic is Satan.
Babble is (Sampling From) PageRank
Previously, I suggested that a Babble generator is a pseudorandom word generator, weighted with a weak, local filter. This is roughly true, but spectacular fails one of the technical goals of a pseudorandom generator: independence. In particular, the next word you Babble is frequently a variation (phonetically or semantically) of the previous one.
PageRank, as far as I know, ranks web pages by the heuristic of “what is the probability of ending up at this page after a random walk with random restarts.” That’s why a better analogy for Babble is sampling from PageRank i.e. taking a weighted random walk in your Babble graph with random restarts. Jackson Pollock is visual Babble.
Imagine you’re playing a game of Scrabble, and you have the seven letters JRKAXN. What does your algorithm feel like?
You scan the board and see an open M. You start Babbling letter combinations that might start with M: MAJR, MRAJ, MRAN, MARN, MARX (oops, proper noun), MARK (great!). That’s the weighted random walk. You set MARK aside and look for another place to start.
Time for a restart. You find an open A before a Triple Word, that’d be great to get! You start Babbling combinations that end with A: NARA, NAXRA, JARA, JAKA, RAKA. No luck.
Maybe the A should be in the middle of the word! ARAN, AKAN, AKAR, AJAR (great!). You sense mean stares for taking so long, so you turn off the Babble and score AJAR for (1+8+1+1)x3 = 33 points. Not too shabby.
The Babble Graph
Last time, I described getting better at Babble as increasing the uniformity of your pseudorandom Babble generator. With a higher-resolution model of Babble in hand, we should reconceptualize increasing uniformity as building a well-connected Babble graph.
What is the Babble graph? It’s the graph within which your words and concepts are connected. Some of these connections are by rhyme and visual similarity, others are semantic or personal. Blood and snow are connected in my Babble graph, for example, because in Chinese they are homophones: snow is 雪 (xue), and blood is 血 (xue). This led to the following paragraph from one of my high school essays (paraphrased):
In Chinese, snow and blood sound the same: “xue.” Some people think the world will end suddenly in nuclear holocaust, pandemic, or a belligerent SkyNet. I think the world will die slowly and painfully, bleeding to death one drop at a time with each New England winter.
My parents had recently dragged me out to jog in the melting post-blizzard slush.
One of my favorite classes in college was a game theory class taught by the wonderful David Parkes; my wife and I lovingly remember the class as Parkes and Rec. One of the striking ideas I learned in Parkes and Rec is that exponentially large graphs can be compactly represented implicitly in memory, as long as individual edges and neighborhoods can be computed in reasonable time. Babble is capable of generating new words and combinations, so the Babble graph contains nodes you’ve never thought of. It’s enormous, and definitely not (a subgraph of) the connectome, but rather implicitly represented therein in a compact way. This is related to the fact that the map is not the territory, except in the study of the brain, where the map is a subset of the territory.
It follows that the Babble graph is a massive implicitly represented graph, which is traversed via random walks with random restarts. How might we optimize this data structure to better fulfill its goals?
One technique I’ve already mentioned is to artificially replace either Babble or Prune to train the other in isolation. This is basically unit testing via mocking. To unit test Babble, we can mock out Prune with a simplified and explicit filter like the haiku, the game of Scrabble, or word games like Contact and Convergence. To unit test Prune, we replace Babble with other sources of word strings: reading, conversation, poetry, music.
Those methods completely black box the Babble and Prune algorithms and hope they self-optimize correctly. What if we want to get our hands dirty and explicitly rewire our Babble graph?
First we have to figure out what makes a quality Babble graph. I can think of two metrics worth optimizing:
Connectivity. With sufficient effort (i.e. taking enough random steps and restarts) you want to eventually explore the entire graph, and repeat yourself rarely. This requires not just that the graph is connected, but that it should have good expansion. Ever feel stuck on an idea, then be struck by external inspiration to explore a disconnected set of ideas you already knew, and find it massively productive? Random walks getting trapped locally is a sign your Babble graph is a bad expander.
Value. Every node in your Babble graph should pay rent. I have found many abandoned components in my Babble graph—ghost towns and wastelands of neural machinery left over from experiences that are no longer relevant. They can be salvaged and repurposed, if only to generate metaphors.
Ramanujan was an extraordinarily creative mathematician who produced formulas like
for the number of partitions of an integer . Exercise: figure out how such an exponent might occur in nature. Hint: .
Ramanujan was also known for his mysticism, attributing his most inspired results to his patron goddess. Mystical experiences, like LSD, are often characterized by the feeling of connectedness of all things. I think Ramanujan’s genius might be the result of having a Babble graph that is an exceptionally good expander. What are those called again?
Here’s a story about how I improved my Babble graph by making my bed.
It all started when Jordan Peterson told me to clean my room—because one’s surroundings are a reflection of one’s state of being. I decided to give it a chance and make my bed every morning.
Making my bed became a daily ritual. As I do it, I repeat the “proper and humble” mantra:
To save the world, I will start by doing the proper and humble things I know how to do within the confines of my own life.
Proper and humble were not words I’d liked in a very long time. They activated ideas I haven’t wrestled with for years.
Honte is a Go term which means “the proper move.” Honte is playing thickly to leave few weaknesses. Honte is killing already dead stones to remove aji. Honte is doing the proper and humble thing to prevent bad aji—failure modes you can’t yet articulate. There’s nothing quite like playing Go against a stronger player to put the fear of aji in you.
In relationships, honte is dedication to the removal of lingering resentment. Unhappy couples have the same fights at regular intervals; the landmines that trigger them might lay untouched for upwards of a year, but they never deactivate. Why would you allow these landmines be planted in the first place? You wouldn’t leave a ladder breaker for your opponent in an unapproached corner, would you? Dedicate yourself to the removal of landmines, at least when you have the slack to do so. That’s honte.
A well-connected and useful Babble graph is thickness (not to be confused with thiccness). It is written: attack from thickness. When thinking from a thick Babble graph, you’re not wandering lackadaisically, building an argument from scraps lying at the side of the trail. You’ll have the weight of your entire intellectual life at your back.
The Artist and the Critic
Two people have independently suggested that the Babble and Prune model is similar to an approach in machine learning known as Generative Adversarial Networks, in which production of photorealistic images (say) is turned into a game between two neural nets, Generator, who learns to generate good counterfeits, and Discriminator, who works on finding the real stuff.
This is a manifestation of the eternal war between the Artist and the Critic, a war that is both exceedingly vicious and exceedingly productive. Artists of the ages have had some choice words for their critics. Beckett:
VLADIMIR
Moron!
ESTRAGON
That’s the idea, let’s abuse each other.
They turn, move apart, turn again and face each other.
VLADIMIR
Moron!
ESTRAGON
Vermin!
VLADIMIR
Abortion!
ESTRAGON
Morpion!
VLADIMIR
Sewer-rat!
ESTRAGON
Curate!
VLADIMIR
Cretin!
ESTRAGON
(with finality) Crritic!
VLADIMIR
Oh!
He wilts, vanquished, and turns away.
The opening lines of Hardy’s A Mathematician’s Apology:
It is a melancholy experience for a professional mathematician to
find himself writing about mathematics. The function of a
mathematician is to do something, to prove new theorems, to add
to mathematics, and not to talk about what he or other mathematicians
have done. Statesmen despise publicists, painters despise
art-critics, and physiologists, physicists, or mathematicians have
usually similar feelings: there is no scorn more profound, or on
the whole more justifiable, than that of the men who make for the
men who explain. Exposition, criticism, appreciation, is work for
second-rate minds.
I have a rule inspired by Solzhenitsyn, which is that every battle which occurs between human beings also plays out within each human heart. The proper locus of the fight between Artist and Critic is not cleanly between artists and critics, but between the Babble and Prune within each individual. After all, find me an artist who has never criticized, or a great critic who is never enjoyable to read for his own sake. Exercise: get some utility out of a bad book you’ve recently read by checking out the savage reviews online.
Like Generator and Discriminator, a good Artist and Critic pair can together ascend to heights that neither could reach alone, and having a filter is a healthy thing. However, I stand by my argument that the overdeveloped Critic is a manifestation of Satan:
Jordan Peterson says Satan is an intellectual figure, and this idea has fermented in my imagination. Satan is the cynical and nihilistic intellectual whose thesis is “things are so bad they do not deserve to exist.”
[...]
I would propose an embellishment of the figure of Satan as the nihilistic intellectual: Satan as the critic. One of the (many) disturbing things I have noticed about my high school curriculum is that English classes are factories for creating critics out of artists. At least in my experience, we wrote short stories, poems, and other free form essays in elementary and middle school, but turned exclusively to the analytical essay by the time high school rolled around.
How frightening is that? Take a generation of teenagers, present them with the greatest literature of our civilization. Then, instead of teaching them to do the obvious thing – imitate – we teach them to analyze – the derivative work of a critic. The work of Satan: the intellectual whose ability to criticize far exceeds his ability to create. And so we find that the best students to come out of our high schools are created in the image of Satan. For every one budding novelist, we have a dozen teenage journalists, lawyers, and activists.
Satan is the voice in your ear who says, “You will never do this well enough for it to be worth doing.” This is the burrowing anxiety that puts me off writing for weeks at at time, the anxiety that anything I produce will not justify its own existence. The subroutine in your head constantly constructing impossibly high standards and handing them to you to use as excuses to do nothing. Satan is characterized by inaction, the inaction caused by paralyzing perfectionism.
Other Things that are Babble
The Bible is the best Babble ever produced. A common atheist refrain is that the Bible is so self-contradictory, so ambiguous, so open to interpretation as to be intrinsically meaningless. Any meaning you might extract from the Bible is just a reflection of your own beliefs.
I think this is a feature, not a bug.
Not only is the Bible open to interpretation, it invites interpretation. Its stories are so varied, fantastical and morally ambiguous that they demand interpretation. The Bible stood the test of time not because it is maximally packed with wisdom, but because it produced the most insightful and varied results when paired with outside sources of Prune. When the Christian is lost and desperate, he inputs 1 Corinthians to his Prune, and voilà! Faith is restored. Peterson’s The Psychological Significance of the Bible series takes advantage of exactly this feature of the Bible: it is the fertile ground upon which each individual can tell their own story. Of course, perversions can result when broken Prune filters are applied, even to the best Babble.
Perhaps writers have been optimizing for the wrong thing. Instead of directly packing insight into an essay, we should try to design high-quality Babble, fertile input for the reader’s Prune.
The Oulipo is Babble training on steroids—a group of writers and mathematicians who worked based on the apparent paradox that freedom is the enemy of creativity. Creativity, the state of having a better Babble generator, is designed to solve tough, heavily constrained problems, and the Oulipians produced creative writing by imposing stricter restraints. Most famously, this method produced Perec’s novel La disparition, a 300-page novel written without the letter ‘e,’ about “a group of individuals looking for a missing companion, Anton Vowl.”
By the way, did you notice the letter missing from this entire post?
All good conversations are therapeutic, and therapeutic conversations are about letting down your guard and allowing yourself to simply Babble. Babies have no Prune at all and babble all the phonemes their adorable little mouths can produce—that’s how they learn the beginnings of language so quickly. Being in a safe space is reproducing this state of development, a place where Babble can be rapidly be optimized on its own terms. Healthy teamwork and collaboration shares this quality: bouncing half-formed, half-nonsensical ideas off others and Pruning them together. Double the Babble, double the fun.
Oh, and about that missing letter? Just kidding. Ain’t nobody got time for that.
This is excellent. Congratulations. Both content and your associations are flowing along pretty nicely, just like you want them to!
This topic happens to hit right in the middle of something that I’ve been working on for some time, namely that all of LW is training people’s pruning while doing essentially nothing to boost their babbling. The outcome is what you’d expect. This is literally the first post I’ve seen on LW ever which I think has a chance to teach people something useful in this area.
Btw, if you aren’t already, I recommend following my hero of high quality intellectual babble, Ribbonfarm’s Rao.
Rao falsely claims to be doing babble, but is generally making structured arguments that can be taken literally, he’s just not spoon-feeding anyone.
Note that I’m using an upgraded version of the concept of “babble”, in which you acknowledge that your pruning/reinforcement learning gradually pushes all your skills down and makes them implicit in your babble network.
In this sense, sufficiently advanced babble contains structured arguments. So I stand behind my assesment of Rao’s internal process as being extremely and consciously babble-heavy (this is not related to what Rao says, just what I recognize as a certain way of handling your mind which I can replicate sufficiently to know the signs).
In fact, if I model how “not spoon feeding anyone” feels from the inside, it seems to use the very same mental motions that I call babble-heavy.
“Anyone” includes yourself, too. Why would you deny yourself the internal experience of babbling, while being gracious enough to not deny your audience the same pleasure? Does this even happen?
On a related note, when I see that someone’s writing is super clear and bright and pointed towards a conclusion I will quite happily predict that their inner experience of generating it is on the extreme other end of the scale.
Which is not a bad thing but means you place more burden on your better readers, and less on your worse readers.
Vao knows this better than anyone, and he does the correct thing and consistently optimizes for the very best kind of readers and I love him for it. This is obviously not pure altruism, because in the same motion he also optimizes his writing for himself so that he can think bigger and better and sneakier.
Could someone briefly summarise why so many people seem to like Venkatesh Rao? I tried reading a few of his essays but didn’t find much to write home about.
From the posts of his I’ve liked, I enjoy his style of allowing ideas to emerge through the haze with an abundance of metaphor and sideplots (i.e. Babble). Good writing needs to simultaneously carry important points and appease the Art God. See for example The Gervais Principle, Premium Mediocre.
I really like the idea that Prune gradually pushes your skills down and makes them implicit in your Babble. It feels something like if your Prune allows stuff through, your Babble goes back and retrains on that stuff and eventually you start just Babbling what you wanted, no filter necessary. It seems retroactively obvious that this is how the exact adversarial training works.
I also definitely see what you’re saying about Rao, my experience of reading him is roughly similar to my experience reading Moldbug in that I end up Pruning some small subset that feels extraordinarily insightful without having the energy to understand the main arc of the argument.
This seems sufficiently far from the initial usage in the discourse that a typology is in order that clearly distinguishes obviously different things. Alkjash’s initial post seemed like it was talking about pretty much the thing Hanson was talking about, which he was explicitly contrasting with an approach that attempts to learn deep generative structure.
Trying to test deep hypotheses efficiently seems like it’s totally outside the Babble/Prune paradigm, and that seems really important to understand and have an account of. Likewise map-territory distinctions.
I actually didn’t know about Hanson’s usage and my definition of Babble allows for pieces that contain entire cached arguments and that can generate deep content. I wanted it to be sufficiently general to contain most patterns of unfiltered thoughts that appear in my head.
Amazing, thank you!
I had to think about dreaming as the purest babble. Connecting images, emotions, anxieties, desires (...) in a more or less free associative way. Absolutely unpruned.
I think paying attention to and recollect dreams is a form of cultivating the babble as an approach.
Also i think humor takes often the form of babble-based creativity – rewiring common sense.
I actually used control-f for every letter in the alphabet before I read the last sentence… Touché. Reminds me of an old exercise in school where there are a bunch of random instructions on a handout and the last is to ignore all previous instructions. Moral of the story was to read all questions before answering them. A more useful one would be to don’t get to focused on what’s right in front of you, explore a little bit first to ensure that you’re taking the best immediate action.
I remember exactly the same exercise from elementary school, because I was the last one to catch on.
This is what I like the Daodejing for. I have a little app on my phone that reguarly pops up a notification with a random section of it. For me it’s a fantastic balance between words that mean something and words that can mean whatever you want them to mean such that reading them sometimes helps me think about things in ways that I might have otherwise not noticed.
I think divination methods like Yijing and tarrot and even horroscopes offer a similar service, and of course psychoactive drugs can also do this. LSD seems especially good at serving this kind of purpose, and seems to possibly have permenant benefits in this direction. I get a lot too out of my narcolepsy by being able to very easily slip into a REM-like state while awake, but this is more a special case that’s probably not available to everyone.
I really like how the posts in this sequence use technical analogies. You refer to some advanced concepts like expanders, but they don’t feel tacked into the ideas. I even learned about implict representation of graphs! (though I knew bounded-degree graphs)
One nitpick is that Ramanujan probably had an amazing Prune too. I feel he’s impressive because he was right so many times. And when he went astray, it was apparently because his lack of schooling in mathematics made him overlooks some aspects of the problem. That feels like the combination of an amazing Babble and Prune, with the Babble getting the better of the Prune for the mistakes.
This is a followup to my comment on the previous post.
This followup (Edit: alkjash’s followup post, not my followup comment) addresses my stated motivation for suggesting that the babble-generator is based on pattern-matching rather than a mere entropy. I had said that there are too many possible ideas for a entropy to generate reasonable ones. For babble to be produced by a random walk along the idea graph is more plausible. It’s not obvious that you couldn’t produce sufficiently high-quality babble with a random-walk along a well-constructed idea-graph.
Now, while I absolutely think the idea graph exists, and I agree that producing babble involves a walk along that graph, I am still committed to the belief that that walk is not random, but is guided by pattern matching. My first reason for holding this belief is introspection: I noticed that ideas are produced by “guess-and-check” (or babble-and-prune) by introspection, and I also noticed that the guessing process is based on pattern matching. That’s fairly weak evidence. My stronger reason for belieiving that babble is produced by pattern matching is that it’s safer to assume that a neurological process is based on pattern matching than random behavior. Neurons are naturally suited to forming pattern-matching machines (please forgive my lay-understanding of cognitive science), and while I don’t see why they couldn’t also form an entropy generator, I don’t suspect that a random walk down the idea graph would be more adaptive than a more “intelligent” pattern matching algorithm.
I also infer that the babble-generator is a pattern matcher from the predictions that makes. If the babble-generator is a random-walk down the idea-graph, then the only way to improve your babble should be to improve your idea graph. If the babble-generator is a pattern-matcher-diected-walk down the idea-graph then you should be able to improve your babble both by training the pattern-matcher well and by improving your idea-graph. Let’s say reading nonfiction impoves your idea graph more effectively than it trains a hypthetical pattern-matcher, and that writing novels trains your pattern-matcher more effectively than it improves your idea-graph. Then if the random-walk hypothesis is true, we should see the same kinds of improvements to babble when we read nonfiction and write novels, but if the pattern-matcher hypothesis is true we should expect different kinds of improvements.
***
I think for the most part we’re talking about the same thing, I’m just suggesting this additional detail of pattern-matching, which has normative consequences (as I sketched out in my previous comment). However I’m not quite sure that we’re talking about the same graphs. You say:
I certainly don’t think think that this graph is a graph of words, even though I agree that there can be connections representing syntactic relationships like rhyme. I don’t think that the babble algorithm is “start at some node of the graph, output the word associated with that node, then select a connected node and repeat.” There is an idea-graph, and it’s used in the production of babble, but not like that. I’m not sure if you were claiming that it does, but in case you were, I disagree. I would try to elaborate what role I do think the idea-graph plays in babble generation, but this comment is already getting very long.
I’m curious about the details of your model of this “babble-graph,” You mention that it can create new connections, which suggest to me that the “graph” is actually a static representation of an active process of connection-drawing. I could be convinced that the pattern-matching I’m talking about is actually a separate process which is responsible for forming these connections. But I’m fuzzy on what exactly you mean so I’m not sure that’s even coherent.
Great posts, I wouldn’t mind a part 3!
I think I was intentionally vague about the things you are emphasizing because I don’t have a higher-resolution picture of what’s going on. I mentioned that “random” means something like “random, biased by the weak, local filter,” but your picture of pattern-matching seems like a better description of the kind of bias that’s actually going on.
Similarly, it’s probably true that there are different levels of Babble going on, at some points you are pattern-matching with literal words, at other points you are using phrases or concepts or entire cached arguments, and I roughly defined the Babble graph to contain all of these things.
I’m inclined to think that the babble you’ve been describing is actually just thoughts, and not linguistic at all. You create thoughts by babble-and-prune and then a separate process converts the thoughts into words. I haven’t thought much about how that process works (and at first inspection I think it’s probably also structured as babble-and-prune), but I think it makes sense to think about it as separate.
If the processes of forming thoughts and phrasing them linguistically were happening at the same level, I’d expect it to be more intuitive to make syntax reflect semantics, like you see in Shakespeare where the phonetic qualities of a character’s speech reflect their personality. Instead, writing like that seems to require System 2 intervention.
But I must admit I’m biased. If I were designing a mind, I’d want to have thought generation uncoupled from sentence generation, but it doesn’t have to actually work that way.
Edit: If generating linguistic-babble happens on a separate level from generating thought-babble, then that has consequences for how to train thought-babble. Your suggestions of playing scrabble and writing haikus would train the wrong babble (nothing wrong with training linguistic-babble, that’s how you become a good writer, but I’m more interested in thought-babble). I think if you wanted to train thought-babble, you’d want to do something like freewriting or brainstorming — rapidly producing a set of related ideas without judgment.
Haha, you seem to be on track:
yes, the process that converts thoughts to words is separate
however, caveat: the words are ALSO used for initialization of your concept network/tree, so these two might continue matching closely by default if you don’t do any individual work on improving them
I can’t give you a RCT for proof but I’ve had this idea for at least 7 months now (blog post) so I had lots of time to verify it
yes, training the concept network/tree directly looks completely different from training the verbal network/tree (though on some meta level the process of doing it is the same)
see this as an example of explicit non-verbal training (notes from improving my rationality-related abstract concept network) - the notes are of course in English, but it should be clear enough that this is not the point: e.g. I’m making up many of the words and phrases as I go because it doesn’t matter for the concept network/tree if my verbal language is standard or not
Aren’t you by any chance attacking a strawman?
There is no “privileged” level of models in the brain. Single English words, in particular, are not privileged in any way.
When your models/neural connections are updated, so are your priors for babble.
This is all dead obvious.
Are you referring to the second half of my comment? Because perhaps I wasn’t clear enough. I’m confused what alkjash means, because some of their references to the babble graph seemed perfectly consistent with my understanding but I got the impression that overall we might not be talking about the same thing.if we are talking about the same thing then that whole section of my comment is irrelevant.
See other comments on this post, I think this is sufficiently resolved by now.
The Bible only seems like Babble if you’re used to works that are full of sentences that can be interpreted in a straightforward and obvious way independent of context, and presuppose that it’s supposed to be a single, fully-worked-out instruction manual for life. It really, really obviously is not, and if you insist on reading it that way, of course it won’t make sense. People lie about the Bible and say it’s things that it’s not, but people lie about fucking everything, there’s no especially strong reason to believe the people who built a religion around a thing, when they tell you what the thing is. If you pay close attention to the structure of the story it’s pretty clear what it’s saying—excepting maybe the account of the chariot, since I actually don’t understand what that part means and it’s famously obscure—and I am pretty sure that I’d be able to tell a book of the Bible I hadn’t read from a made-up one, even in English.
I’m trying to parse this, and I think we’re saying the same thing and you’re just using the word Babble differently. I’ve roughly defined Babble as “pseudo-randomly generated proto-thoughts”, and good Babble as “insight-rich input from which Prune can find insight.” Help?
I’m saying that the Bible has content, and this content is understandable, but it takes some active work to understand because unlike most modern writing it is not spoon-feeding you everything.
six