I think this post, as promised in the epistemic status, errs on the side of simplistic poetry. I see its core contribution as saying that the more people you want to communicate to, the less you can communicate to them, because the marginal people aren’t willing to put in work to understand you, and because it’s harder to talk to marginal people who are far away and can’t ask clarifying questions or see your facial expressions or hear your tone of voice. The numbers attached (e.g. ‘five’ and ‘thousands of people’) seem to not be super precise.
That being said: the numbers are the easiest thing to take away from this post. The title includes the words ‘about five’ but not the words ‘simplifed poetry’. And I’m just not sure about the numbers. The best part of the post is the initial part, which does a calculation and links to a paper to support an order-of-magnitude calculation on how many words you can communicate to people. But as the paragraphs go on, the justifications get less airtight, until it’s basically an assertion. I think I understand stylistically why this was done, but at the end of the day that’s the trade-off that was made.
So a reader of this post has to ask themselves: Why is the number about five? Is this ‘about’ meaning that you have a factor of 2 wiggle-room? 10? 100? How do I know that this kicks in once I hit thousands of people, rather than hundreds or millions? If I want to communicate to billions of people, does that go down much? These questions are left unanswered in the post. That would be fine if they were answered somewhere else that was linked, but they aren’t. As such, the discerning reader should only believe the conclusion (to the extent they can make it out) if they trust Ray Arnold, the author.
I think plausibly people should trust Ray on this, at least people who know him. But much of the readership of this post doesn’t know him and has no reason to trust him on this one.
Overall: this post has a true and important core that can be explained and argued for. But the main surface claim isn’t justified in the post or in places the post links to, so I don’t think that this was one of the best posts of 2019, either by my standards or by the standards I think the LessWrong community should hold itself to.
The aspiring-rigorous-next-post I hope to write someday is called “The Working Memory Hypothesis”, laying out more concretely that at some maximum scale, your coordination-complexity is bottlenecked on a single working-memory-cluster, which (AFAICT based on experience and working memory research) amounts to 3-7 chunks of concepts that people already are familiar with.
So, I am fairly confident that in the limit it is actually about 5 words +/- 2, because Working Memory Science and some observations about what slogans propagate. (But, am much less sure about how fast the limit approaches and what happens along the way)
To me, this suggests a major change to the message of the post. Reading it, I’d think that I have five samples from the bank of existing words, but if the constraint is just that I have five concepts that can eventually be turned into words, that’s a much looser constraint!
Not 100% sure I understand the point, but for concepts-you-can-communicate, I think you are bottlenecked on already-popular-words.
Chunks and words don’t map perfectly. But… word-space is probably mostly a subset of chunk-space?
I think wordless chunks matter for intellectual progress, where an individual thinker might have juuuust reached the point where they’ve distilled a concept in their head down into a single chunk, so they can then reason about how that fits with other concepts. But, if they want to communicate about that concept, they’ll need to somehow turn it into words.
Is the claim that before I learn some new thing, each of my working memory slots is just a single word that I already know? Because I’m pretty sure that’s not true.
First: the epistemic status of this whole convo is “thing Ray is still thinking through and is not very sure about.”
Two, for your specific question: No, my claim is that wordspace is a (mostly) subset of chunkspace, not the other way round. My claim is something like “words are chunks that you’ve given a name”, but you can think in chunks that have not been given names.
Three: I’m not taking that claim literally, I’m just sorta trying it out to see if it fits, and where it fails. I’m guessing it’ll fail somewhere but I’m not actually sure where yet. If you can point to a concrete way that it fails to make sense that’d be helpful.
But, insofar as I’m running with this idea:
An inventor who is coming up with a new thing might be working entirely with wordless chunks, that they invent, combine them into bigger ideas, compress into smaller chunks, without ever being verbalized or given word form.
might be working entirely with wordless chunks, that they invent, combine them into bigger ideas, compress into smaller chunks, without ever being verbalized or given word form.
This part points pretty directly at research debt and inferential distance, where the debt is how many of these chunks need to be named and communicated as chunks, and the distance is how many re-chunking steps need to be done.
Thinking a little more: I think when I’m parsing a written sentence, words are closer like one-word-to-one-chunk correspondence. When I’m thinking them, I think groups of words tend to be more like a chunk. “Politics is the mind killer” might collapse into a single slot that I’m not looking at at super-high resolution, allowing me to reason something like “‘Politics is the mindkiller’ is an incomplete idea.’”
If wordspace is a subset of chunkspace and not the other way around, and you have about five chunks, do you agree that you do not have about five words, but rather more?
Yes, although I’ve heard mixed things about how many chunks you actually have, and that the number might be more like 4.
Also, the ideas often get propagated in conjunction with other ideas. I.e. people don’t just say “politics is the mindkiller”, they say “politics is the mindkiller, therefore X” (where X is whatever point they’re making in the conversation). And that sentence is bottlenecked on total comprehensibility. So, basically the more chunks you’re using up with your core idea, the more you’re at the mercy of other people truncating it when they need to fit other ideas in.
I’d argue “politics is the mindkiller” is two chunks initially, because people parse “is” and “the” somewhat intuitively or fill them in. Whereas Avoid Unnecessary Political Arguments is more like 4 chunks. I think you typically need at least 2 chunks to say something meaningful, although maybe not always.
Once something becomes popular it can eventually compress down to 1 chunk. But, also, I think “sentence complexity” is not only bottlenecked on chunks. “Politics is the mindkiller” can be conceptually one chunk, but it still takes a bunch of visual or verbal space up while parsing a sentence that makes it harder to read if it’s only one clause in a multi-step argument. I’m not 100% sure if this is secretly still an application of working memory, or if it’s a different issue.
I’m wondering how Gendlin Focusing interacts with working memory.
I think the first phase of focusing is pre-chunk, as well as pre-verbal. You’re noticing a bunch of stuff going on in your body. It’s more of a sensation than a thought.
The process of focusing is trying to get those sensations into a form your brain can actually work with and think about.
I… notice that focusing takes basically all my concentration. I think at some part of the process it’s using working memory (and basically all of my working memory). But I’m not sure when that is.
One of the things you do in focusing is try to give your felt-sense a bunch of names and see if they fit, and notice the dissonance. I think when this process starts, the felt-sense is not stored in chunk form. I think as I try to give it different names
Gendlin Focusing might be a process where
a) first I’m trying to feel out a bunch of felt-data that isn’t even in chunk form yet
b) I sort of feel it out, while trying different word-combos on it. Meanwhile it’s getting more solid in my head. I think it’s… slowly transitioning from wordless non-chunks into wordless chunks, and then when I finally find the right name that describes it I’m like “ah, that’s it”, and then it simultaneiously solidifies into one-or-more chunks I can store properly in working memory, and also gets a name. (The name might be multiple words, and depending on context those words could correspond to one chunk or multiple)
Not about Gendlin, but following the trail of relating chunks to other things: I wonder if propaganda or cult indoctrination can be described as a malicious chunking process.
I’ve weighed in against taking the numbers literally elsewhere, but following this thread I suddenly wondered if the work that using few words was doing isn’t delivering the chunk, but rather screening out any alternative chunk. If what we are interested in is common knowledge, it isn’t getting people to develop a chunk per se that is the challenge; rather everyone has to agree on exactly which chunk everyone else is using. This sounds much more like the work of a filter than a generator.
When I thought about it in those terms, it occurred to me that it is perfectly possible to drive this in any direction at all; we aren’t even meaningfully constrained by reality. This feels obvious in retrospect—there’ve been lots of times when common knowledge was utterly wrong—but doing that on purpose never occurred to me.
So now it feels like what cults do, and why they sound so weird to everyone outside of them, is deliberately create a different sequence of chunks for normal things for the purpose of having different chunks. Once that is done, the availability heuristic will sustain communication on that basis, and the artificially-induced inferential distance will tend to isolate them from anyone outside the group.
Do working memory chunks come in order? Like, I’d kind of expect that if you have 5 concepts in working memory, you can’t additionally remember the order they should go in, because that’s another working memory chunk. Or if you can remember the order they should go in, then introspectively I’d imagine they’d become one working memory chunk.
I don’t really know, but my guess is that, well, it’s a bit messy, and yes if your chunks need to fit in a particular combination that you don’t have a good grasp on, that strains your working memory.
But, I don’t think there are literal chunks and ordering them literally costs a chunk. Chunks are patterns of thought that can bring associations of other patterns of thought, and those associations can be stronger or weaker. If the associations are sufficiently strong it makes sense to model the chunk-cluster as a single chunk.
(I notice I’m somewhat confused about this, and somewhat going off “there’s enough working memory research that I’m fairly confident ‘chunks’ is a useful abstraction, but I’m not sure why.”)
I’m kinda brain-dead right now and can’t introspect well enough to figure out how it subjectively feels for me.
I think this post of mine is… probably relevant, although it might require some additional inference to make the relevance obvious:
The thing I’m unsure about here is why does that not apply to one-on-one communication? And if one-on-one communication doesn’t suffer from this limit, why does it not hold for getting a message to thousands by mathematical induction? Perhaps the problem is that you lose bits in the retelling when people forget things or word things badly—but surely you also pick up bits in more people actually thinking about the message and seeing flaws in it and ways it can be tweaked to be more true?
I think all communication is bottlenecked by the working memory limit, but the limit has different ramifications in different contexts.
I agree with Romeo’s take elsethread that part of what’s going on here is “how many feedback loops you can have going on at once. Feedback loops can unpack into larger things, but you have to actually do the unpacking.”
(I have a bunch more thoughts on this that are probably need to be a top-level post)
Perhaps the problem is that you lose bits in the retelling when people forget things or word things badly—but surely you also pick up bits in more people actually thinking about the message and seeing flaws in it and ways it can be tweaked to be more true.
note that if people are seeing flaw and improving your idea, then they aren’t coordinating on a single thing, and if it matters that lots of people are moving in lockstep it can be actively harmful if they’re ‘improving’ your idea.
But, more realistically: most people aren’t necessarily improving things, they’re adapting them to make them better/more-convenient/more-aligned for them. (Or, just forgetting or misremembering or whatever)
Preserving a complex idea at high fidelity is very hard.
I think this post, as promised in the epistemic status, errs on the side of simplistic poetry. I see its core contribution as saying that the more people you want to communicate to, the less you can communicate to them, because the marginal people aren’t willing to put in work to understand you, and because it’s harder to talk to marginal people who are far away and can’t ask clarifying questions or see your facial expressions or hear your tone of voice. The numbers attached (e.g. ‘five’ and ‘thousands of people’) seem to not be super precise.
That being said: the numbers are the easiest thing to take away from this post. The title includes the words ‘about five’ but not the words ‘simplifed poetry’. And I’m just not sure about the numbers. The best part of the post is the initial part, which does a calculation and links to a paper to support an order-of-magnitude calculation on how many words you can communicate to people. But as the paragraphs go on, the justifications get less airtight, until it’s basically an assertion. I think I understand stylistically why this was done, but at the end of the day that’s the trade-off that was made.
So a reader of this post has to ask themselves: Why is the number about five? Is this ‘about’ meaning that you have a factor of 2 wiggle-room? 10? 100? How do I know that this kicks in once I hit thousands of people, rather than hundreds or millions? If I want to communicate to billions of people, does that go down much? These questions are left unanswered in the post. That would be fine if they were answered somewhere else that was linked, but they aren’t. As such, the discerning reader should only believe the conclusion (to the extent they can make it out) if they trust Ray Arnold, the author.
I think plausibly people should trust Ray on this, at least people who know him. But much of the readership of this post doesn’t know him and has no reason to trust him on this one.
Overall: this post has a true and important core that can be explained and argued for. But the main surface claim isn’t justified in the post or in places the post links to, so I don’t think that this was one of the best posts of 2019, either by my standards or by the standards I think the LessWrong community should hold itself to.
The aspiring-rigorous-next-post I hope to write someday is called “The Working Memory Hypothesis”, laying out more concretely that at some maximum scale, your coordination-complexity is bottlenecked on a single working-memory-cluster, which (AFAICT based on experience and working memory research) amounts to 3-7 chunks of concepts that people already are familiar with.
So, I am fairly confident that in the limit it is actually about 5 words +/- 2, because Working Memory Science and some observations about what slogans propagate. (But, am much less sure about how fast the limit approaches and what happens along the way)
Aren’t working memory chunks much bigger than one word each, at least potentially?
I think if you end up having a chunk that you use repeatedly and need to communicate about, it ends up turning into a word.
(like, chunks are flexible, but so are words)
To me, this suggests a major change to the message of the post. Reading it, I’d think that I have five samples from the bank of existing words, but if the constraint is just that I have five concepts that can eventually be turned into words, that’s a much looser constraint!
Not 100% sure I understand the point, but for concepts-you-can-communicate, I think you are bottlenecked on already-popular-words.
Chunks and words don’t map perfectly. But… word-space is probably mostly a subset of chunk-space?
I think wordless chunks matter for intellectual progress, where an individual thinker might have juuuust reached the point where they’ve distilled a concept in their head down into a single chunk, so they can then reason about how that fits with other concepts. But, if they want to communicate about that concept, they’ll need to somehow turn it into words.
Is the claim that before I learn some new thing, each of my working memory slots is just a single word that I already know? Because I’m pretty sure that’s not true.
First: the epistemic status of this whole convo is “thing Ray is still thinking through and is not very sure about.”
Two, for your specific question: No, my claim is that wordspace is a (mostly) subset of chunkspace, not the other way round. My claim is something like “words are chunks that you’ve given a name”, but you can think in chunks that have not been given names.
Three: I’m not taking that claim literally, I’m just sorta trying it out to see if it fits, and where it fails. I’m guessing it’ll fail somewhere but I’m not actually sure where yet. If you can point to a concrete way that it fails to make sense that’d be helpful.
But, insofar as I’m running with this idea:
An inventor who is coming up with a new thing might be working entirely with wordless chunks, that they invent, combine them into bigger ideas, compress into smaller chunks, without ever being verbalized or given word form.
This part points pretty directly at research debt and inferential distance, where the debt is how many of these chunks need to be named and communicated as chunks, and the distance is how many re-chunking steps need to be done.
Thinking a little more: I think when I’m parsing a written sentence, words are closer like one-word-to-one-chunk correspondence. When I’m thinking them, I think groups of words tend to be more like a chunk. “Politics is the mind killer” might collapse into a single slot that I’m not looking at at super-high resolution, allowing me to reason something like “‘Politics is the mindkiller’ is an incomplete idea.’”
If wordspace is a subset of chunkspace and not the other way around, and you have about five chunks, do you agree that you do not have about five words, but rather more?
Yes, although I’ve heard mixed things about how many chunks you actually have, and that the number might be more like 4.
Also, the ideas often get propagated in conjunction with other ideas. I.e. people don’t just say “politics is the mindkiller”, they say “politics is the mindkiller, therefore X” (where X is whatever point they’re making in the conversation). And that sentence is bottlenecked on total comprehensibility. So, basically the more chunks you’re using up with your core idea, the more you’re at the mercy of other people truncating it when they need to fit other ideas in.
I’d argue “politics is the mindkiller” is two chunks initially, because people parse “is” and “the” somewhat intuitively or fill them in. Whereas Avoid Unnecessary Political Arguments is more like 4 chunks. I think you typically need at least 2 chunks to say something meaningful, although maybe not always.
Once something becomes popular it can eventually compress down to 1 chunk. But, also, I think “sentence complexity” is not only bottlenecked on chunks. “Politics is the mindkiller” can be conceptually one chunk, but it still takes a bunch of visual or verbal space up while parsing a sentence that makes it harder to read if it’s only one clause in a multi-step argument. I’m not 100% sure if this is secretly still an application of working memory, or if it’s a different issue.
Continuing to babble down this thought-trail:
I’m wondering how Gendlin Focusing interacts with working memory.
I think the first phase of focusing is pre-chunk, as well as pre-verbal. You’re noticing a bunch of stuff going on in your body. It’s more of a sensation than a thought.
The process of focusing is trying to get those sensations into a form your brain can actually work with and think about.
I… notice that focusing takes basically all my concentration. I think at some part of the process it’s using working memory (and basically all of my working memory). But I’m not sure when that is.
One of the things you do in focusing is try to give your felt-sense a bunch of names and see if they fit, and notice the dissonance. I think when this process starts, the felt-sense is not stored in chunk form. I think as I try to give it different names
Gendlin Focusing might be a process where
a) first I’m trying to feel out a bunch of felt-data that isn’t even in chunk form yet
b) I sort of feel it out, while trying different word-combos on it. Meanwhile it’s getting more solid in my head. I think it’s… slowly transitioning from wordless non-chunks into wordless chunks, and then when I finally find the right name that describes it I’m like “ah, that’s it”, and then it simultaneiously solidifies into one-or-more chunks I can store properly in working memory, and also gets a name. (The name might be multiple words, and depending on context those words could correspond to one chunk or multiple)
Not about Gendlin, but following the trail of relating chunks to other things: I wonder if propaganda or cult indoctrination can be described as a malicious chunking process.
I’ve weighed in against taking the numbers literally elsewhere, but following this thread I suddenly wondered if the work that using few words was doing isn’t delivering the chunk, but rather screening out any alternative chunk. If what we are interested in is common knowledge, it isn’t getting people to develop a chunk per se that is the challenge; rather everyone has to agree on exactly which chunk everyone else is using. This sounds much more like the work of a filter than a generator.
When I thought about it in those terms, it occurred to me that it is perfectly possible to drive this in any direction at all; we aren’t even meaningfully constrained by reality. This feels obvious in retrospect—there’ve been lots of times when common knowledge was utterly wrong—but doing that on purpose never occurred to me.
So now it feels like what cults do, and why they sound so weird to everyone outside of them, is deliberately create a different sequence of chunks for normal things for the purpose of having different chunks. Once that is done, the availability heuristic will sustain communication on that basis, and the artificially-induced inferential distance will tend to isolate them from anyone outside the group.
Do working memory chunks come in order? Like, I’d kind of expect that if you have 5 concepts in working memory, you can’t additionally remember the order they should go in, because that’s another working memory chunk. Or if you can remember the order they should go in, then introspectively I’d imagine they’d become one working memory chunk.
I don’t really know, but my guess is that, well, it’s a bit messy, and yes if your chunks need to fit in a particular combination that you don’t have a good grasp on, that strains your working memory.
But, I don’t think there are literal chunks and ordering them literally costs a chunk. Chunks are patterns of thought that can bring associations of other patterns of thought, and those associations can be stronger or weaker. If the associations are sufficiently strong it makes sense to model the chunk-cluster as a single chunk.
(I notice I’m somewhat confused about this, and somewhat going off “there’s enough working memory research that I’m fairly confident ‘chunks’ is a useful abstraction, but I’m not sure why.”)
I’m kinda brain-dead right now and can’t introspect well enough to figure out how it subjectively feels for me.
I think this post of mine is… probably relevant, although it might require some additional inference to make the relevance obvious:
https://www.lesswrong.com/posts/n7vPLsbTzpk8XXEAS/what-s-your-cognitive-algorithm
The thing I’m unsure about here is why does that not apply to one-on-one communication? And if one-on-one communication doesn’t suffer from this limit, why does it not hold for getting a message to thousands by mathematical induction? Perhaps the problem is that you lose bits in the retelling when people forget things or word things badly—but surely you also pick up bits in more people actually thinking about the message and seeing flaws in it and ways it can be tweaked to be more true?
I think all communication is bottlenecked by the working memory limit, but the limit has different ramifications in different contexts.
I agree with Romeo’s take elsethread that part of what’s going on here is “how many feedback loops you can have going on at once. Feedback loops can unpack into larger things, but you have to actually do the unpacking.”
(I have a bunch more thoughts on this that are probably need to be a top-level post)
note that if people are seeing flaw and improving your idea, then they aren’t coordinating on a single thing, and if it matters that lots of people are moving in lockstep it can be actively harmful if they’re ‘improving’ your idea.
But, more realistically: most people aren’t necessarily improving things, they’re adapting them to make them better/more-convenient/more-aligned for them. (Or, just forgetting or misremembering or whatever)
Preserving a complex idea at high fidelity is very hard.