I’m worried about this, concretely, because after reading Effective Altruism is Self Recommending a while, despite the fact that I thought lots about it, and wrote up detailed responses to it (some of which I posted and some of which I just thought about privately), and I ran a meetup somewhat inspired by taking it seriously...
...despite all that, a year ago when I tried to remember what it was about, all I could remember was “givewell == ponzi scheme == bad”, without any context of why the ponzi scheme metaphor mattered or how the principle was supposed to generalize. I’m similarly worried that a year from now, “werewolves == bad, hunt werewolves”, is going to be the thing I remember about this.
The five-word-limit isn’t just for the uninformed public, it’s for serious people trying to coordinate. The public can only coordinate around 5-word things. Serious people trying to be informed still have to ingest lots of information and form detailed models but those models are still going to have major bits that are compressed, out of pieces that end up being about five words. And this is a major part of whymany people are confused about Effective Altruism and how to do it right in the first place.
If that’s your outlook, it seems pointless to write anything longer than five words on any topic other than how to fix this problem.
I agree with the general urgency of the problem, although I think the frame of your comment is somewhat off. This problem seems… very information-theoretically-entrenched. I have some sense that you think of it as solvable in a way that it’s fundamentally not actually solvable, just improvable, like you’re trying to build a perpetual motion machine instead of a more efficient engine. There is only so much information people can process.
(This is based entirely off of reading between the lines of comments you’ve made, and I’m not confident what your outlook actually is here, and apologies for the armchair psychologizing).
I think you can make progress on it, which would look something like:
0) make sure people are aware of the problem
1) building better infrastructure (social or technological), probably could be grouped into a few goals:
nudge readers towards certain behavior
nudge writers towards certain behavior
provide tools that amplify readers capabilities
provide tools that amplify writer’s capabilities
2) meanwhile, as a writer, make sure that the concepts you create for the public discourse are optimized for the right kind of compression. Some ideas compress better than others. (I have thought about the details of this
This *is* my outlook, and that yes this that both I, as well as you and Jessica, should probably be taking some kind of action that takes this outlook strategically seriously if you aren’t already.
Distillation Technology
A major goal I have for LessWrong, which the team has talked about a lot, is improving distillation technology. It’s not what we’re currently working on because, well, there are *multiple* top priorities that all seem pretty urgent (and all seem like pieces of the same puzzle). But I think Distillation Tech is the sort of thing most likely to meaningfully improve the situation.
Right now the default mode people interact with LessWrong and many other blogging platforms is “write up a thing, post it, maybe change a few things in response to feedback.” But for ideas that are actually going to become building blocks of the intellectual commons, you need to continuously invest in improving them.
Arbital tried to do this, and it failed because the problem is hard in weird ways, many of them somewhat hard to anticipate.
http://distill.pub tackles a piece of this but not in a way that seems especially scalable.
Scott Alexander’s short story Ars Longa Vita Brevis is a fictional account of what seems necessary to me.
I do hope that by the end of this year the LW team will have made some concrete progress on this. I think it is plausibly a mistake that we haven’t focused on it already – we discussed switching gears towards it at our last retreat but it seemed to make more sense to finish Open Questions.
Trying to nudge others seems like an attempt to route around the problem rather than solve it. It seems like you tried pretty hard to integrate the substantive points in my “Effective Altruism is self-recommending” post, and even with pretty extensive active engagement, your estimate is that you only retained a very superficial summary. I don’t see how any compression tech for communication at scale can compete with what an engaged reader like you should be able to do for themselves while taking that kind of initiative.
We know this problem has been solved in the past in some domains—you can’t do a thing like the Apollo project or build working hospitals where cardiovascular surgery is regularly successful based on a series of atomic five-word commands; some sort of recursive general grammar is required, and at least some of the participants need to share detailed models.
One way this could be compatible with your observation is that people have somewhat recently gotten worse at this sort of skill; another is that credit-assignment is an unusually difficult domain to do this in. My recentblog posts have argued that at least the latter is true.
In the former case (lost literacy), we should be able to reconstruct older modes of coordination. In the latter (politics has always been hard to think clearly about), we should at least internally be able to learn from each other by learning to apply cognitive architectures we use in domains where we find this sort of thing comparatively easy.
I think I may have communicatedly somewhat poorly by phrasing this in terms of 5 words, rather than 5 chunks, and will try to write a new post sometime that presents a more formal theory of what’s going on.
Coordinated actions can’t take up more bandwidth than someone’s working memory (which is something like 7 chunks, and if you’re using all 7 chunks then they don’t have any spare chunks to handle weird edge cases).
A lot of coordination (and communication) is about reducing the chunk-size of actions. This is why jargon is useful, habits and training are useful (as well as checklists and forms and bureaucracy), since that can condense an otherwise unworkably long instruction into something people can manage.
And:
The “Go to the store” is four words. But “go” actually means “stand up. walk to the door. open the door. Walk to your car. Open your car door. Get inside. Take the key out of your pocket. Put the key in the ignition slot...” etc. (Which are in turn actually broken into smaller steps like “lift your front leg up while adjusting your weight forward”)
But, you are capable of taking all of that an chunking it as the concept “go somewhere” (as as well as the meta concept of “go to the place whichever way is most convenient, which might be walking or biking or taking a bus”), although if you have to use a form of transport you are less familiar with, remembering how to do it might take up a lot of working memory slots, leaving you liable to forget other parts of your plan.
I do in fact expect that the Apollo project worked via finding ways to cache things into manageable chunks, even for the people who kept the whole project in their head.
I’d be interested in figuring out how to operationalize this as a bet and check how the project actually worked. What I have heard (epistemic status: heard it from some guy on the internet) is that actually, most people on the project did not have all the pieces in their head, and the only people who did were the pilots.
My guess is that the pilots had a model of how to *use* and *repair* all the pieces of the ship, but couldn’t have built it themselves.
My guess it that “the people who actually designed and assembled the thing” had a model of how all the pieces fit together, but not as a deep a model of how and when to use it, and may have only understood the inputs and outputs of each piece.
And meanwhile, while I’m not quite sure how to operationalize the bet, I would bet maybe $50 that (conditional on us finding a good operationalization), that the number of people who had the full model or anything like it was quite small. (“You Have About Five Words” doesn’t claim you can’t have more than 5 words of nuance, it claims that you can’t coordinate large groups of people that depend on more than 5 words of nuance. I bet there were less than 100 people and probably closer to 10 who had anything like a full model of everything going on)
and will try to write a new post sometime that presents a more formal theory of what’s going on
I think I’m unclear on how this constrains anticipations, and in particular it seems like there’s substantial ambiguity as to what claim you’re making, such that it could be any of these:
You can’t communicate recursive structures or models with more than five total chunks via mass media such as writing.
You can’t get humans to act (or in particular to take initiative) based on such models, so you’re limited to direct commands when coordinating actions.
There exist such people, but they’re very few and stretched between very different projects and there’s nothing we can do about that.
I think there are two different anticipation-constraining-claims, similar but not quite what you said there:
Working Memory Learning Hypothesis – people can learn complex or recursive concepts, but each chunk that they learn cannot be composed of more than 7 other chunks. You can learn a 49 chunk concept but first must distill it into seven 7-chunk-concepts, learn each one, and then combine them together.
Coordination Nuance Hypothesis – there are limits to how nuanced a model you can coordinate around, at various scales of coordination. I’m not sure precisely what the limits are, but it seems quite clear that the more people you are coordinating the harder it is to get them to share a nuanced model or strategy. It’s easier to have a nuanced strategy with 10 people than 100, 1000, or 10,000.
I’m less confident of the Working Memory hypothesis (it’s an armchair inside view based on my understanding of how working memory works)
I’m fairly confident in the Coordination Nuance Hypothesis, which is based on observations about how people actually seem to coordinate at various scales and how much nuance they seem to preserve.
In both cases, there are tools available to improve your ability to learn (as an individual), disseminate information (as a communicator), and keep people organized (as a leader). But none of the tools changed the fundamental equation, just the terms.
Anticipation Constraints:
The anticipation-constraint of the WMLH is “if you try to learn a concept that requires more than 7 chunks, you will fail. If a concept requires 12 chunks, you will not successfully learn it (or will learn a simplified bastardization of it) until you find a way to compress the 12 chunks into 7. If you have to do this yourself, it will take longer than if an educator has optimized it for you in advance.”
The anticipation constraint of the CNH is that if you try to coordinate with 100 people of a given level of intelligence, the shared complexity of the plan that you are enacting will be lower than the complexity of the plan you could enact with 10 people. If you try to implement a more complex plan or orient around a more complex model, your organization will make mistakes due to distorted simplifications of the plan. And this gets worse as your organizations scales.
I agree they are different but think it is the case that with a larger group you have a harder time with either of them, for roughly the same reasons at roughly the same rate of increased difficulty.
The Working Memory Hypothesis says the Bell Labs is useful, in part, because whenever you need to combine multiple interdisciplinary concepts that are each complicated to invent a new concept…
instead of having to read a textbook that explains it one-particular-way (and, if it’s not your field, you’d need to get up to speed on the entire field in order to have any context at all) you can just walk down the hall and ask the guy who invented the concept “how does this work” and have them explain it to you multiple times until they find a way to compress it down into a 7 chunks, optimized for your current level of understanding.
A slightly more accurate anticipation of the CNH is:
people need to spend time learning a thing in order to coordinate around it. At the very least, the more time you need to spend getting people up to speed on a model, the less time they have to actually act on that model
people have idiosyncratic learning styles, and are going to misinterpret some bits of your plan, and you won’t know in advance which ones. Dealing with this requires individual attention, noticing their mistakes and correcting them. Middle managers (and middle “educators” can help to alleviate this, but every link in the chain reduces your control over what message gets distributed. If you need 10,000 people to all understand and act on the same plan/model, it needs to be simple or robust enough to survive 10,000 people misinterpreting it in slightly different ways
This gets even worse if you need to change your plan over time in response to new information, since now people are getting it confused with the old plan, or they don’t agree with the new plan because they signed up for the old plan, and then you have to Do Politics to get them on board with the new plan.
At the very least, if you’ve coordinated perfectly, each time you change your plan you need to shift from “focusing on execution” to “focusing on getting people up to speed on the new model.”
If that’s your outlook, it seems pointless to write anything longer than five words on any topic other than how to fix this problem.
I agree with the general urgency of the problem, although I think the frame of your comment is somewhat off. This problem seems… very information-theoretically-entrenched. I have some sense that you think of it as solvable in a way that it’s fundamentally not actually solvable, just improvable, like you’re trying to build a perpetual motion machine instead of a more efficient engine. There is only so much information people can process.
(This is based entirely off of reading between the lines of comments you’ve made, and I’m not confident what your outlook actually is here, and apologies for the armchair psychologizing).
I think you can make progress on it, which would look something like:
0) make sure people are aware of the problem
1) building better infrastructure (social or technological), probably could be grouped into a few goals:
nudge readers towards certain behavior
nudge writers towards certain behavior
provide tools that amplify readers capabilities
provide tools that amplify writer’s capabilities
2) meanwhile, as a writer, make sure that the concepts you create for the public discourse are optimized for the right kind of compression. Some ideas compress better than others. (I have thought about the details of this
This *is* my outlook, and that yes this that both I, as well as you and Jessica, should probably be taking some kind of action that takes this outlook strategically seriously if you aren’t already.
Distillation Technology
A major goal I have for LessWrong, which the team has talked about a lot, is improving distillation technology. It’s not what we’re currently working on because, well, there are *multiple* top priorities that all seem pretty urgent (and all seem like pieces of the same puzzle). But I think Distillation Tech is the sort of thing most likely to meaningfully improve the situation.
Right now the default mode people interact with LessWrong and many other blogging platforms is “write up a thing, post it, maybe change a few things in response to feedback.” But for ideas that are actually going to become building blocks of the intellectual commons, you need to continuously invest in improving them.
Arbital tried to do this, and it failed because the problem is hard in weird ways, many of them somewhat hard to anticipate.
http://distill.pub tackles a piece of this but not in a way that seems especially scalable.
Scott Alexander’s short story Ars Longa Vita Brevis is a fictional account of what seems necessary to me.
I do hope that by the end of this year the LW team will have made some concrete progress on this. I think it is plausibly a mistake that we haven’t focused on it already – we discussed switching gears towards it at our last retreat but it seemed to make more sense to finish Open Questions.
Trying to nudge others seems like an attempt to route around the problem rather than solve it. It seems like you tried pretty hard to integrate the substantive points in my “Effective Altruism is self-recommending” post, and even with pretty extensive active engagement, your estimate is that you only retained a very superficial summary. I don’t see how any compression tech for communication at scale can compete with what an engaged reader like you should be able to do for themselves while taking that kind of initiative.
We know this problem has been solved in the past in some domains—you can’t do a thing like the Apollo project or build working hospitals where cardiovascular surgery is regularly successful based on a series of atomic five-word commands; some sort of recursive general grammar is required, and at least some of the participants need to share detailed models.
One way this could be compatible with your observation is that people have somewhat recently gotten worse at this sort of skill; another is that credit-assignment is an unusually difficult domain to do this in. My recent blog posts have argued that at least the latter is true.
In the former case (lost literacy), we should be able to reconstruct older modes of coordination. In the latter (politics has always been hard to think clearly about), we should at least internally be able to learn from each other by learning to apply cognitive architectures we use in domains where we find this sort of thing comparatively easy.
I think I may have communicatedly somewhat poorly by phrasing this in terms of 5 words, rather than 5 chunks, and will try to write a new post sometime that presents a more formal theory of what’s going on.
I mentioned in the comments of the previous post:
And:
I do in fact expect that the Apollo project worked via finding ways to cache things into manageable chunks, even for the people who kept the whole project in their head.
Chunks can be nested, and chunks can include subtle neural-network-weights that are part of your background experience and aren’t quite explicit knowledge. It can be very hard to communicate subtle nuances as part of the chunks if you don’t have excess to high volume and preferably in-person communication.
I’d be interested in figuring out how to operationalize this as a bet and check how the project actually worked. What I have heard (epistemic status: heard it from some guy on the internet) is that actually, most people on the project did not have all the pieces in their head, and the only people who did were the pilots.
My guess is that the pilots had a model of how to *use* and *repair* all the pieces of the ship, but couldn’t have built it themselves.
My guess it that “the people who actually designed and assembled the thing” had a model of how all the pieces fit together, but not as a deep a model of how and when to use it, and may have only understood the inputs and outputs of each piece.
And meanwhile, while I’m not quite sure how to operationalize the bet, I would bet maybe $50 that (conditional on us finding a good operationalization), that the number of people who had the full model or anything like it was quite small. (“You Have About Five Words” doesn’t claim you can’t have more than 5 words of nuance, it claims that you can’t coordinate large groups of people that depend on more than 5 words of nuance. I bet there were less than 100 people and probably closer to 10 who had anything like a full model of everything going on)
I think I’m unclear on how this constrains anticipations, and in particular it seems like there’s substantial ambiguity as to what claim you’re making, such that it could be any of these:
You can’t communicate recursive structures or models with more than five total chunks via mass media such as writing.
You can’t get humans to act (or in particular to take initiative) based on such models, so you’re limited to direct commands when coordinating actions.
There exist such people, but they’re very few and stretched between very different projects and there’s nothing we can do about that.
??? Something else ???
I think there are two different anticipation-constraining-claims, similar but not quite what you said there:
Working Memory Learning Hypothesis – people can learn complex or recursive concepts, but each chunk that they learn cannot be composed of more than 7 other chunks. You can learn a 49 chunk concept but first must distill it into seven 7-chunk-concepts, learn each one, and then combine them together.
Coordination Nuance Hypothesis – there are limits to how nuanced a model you can coordinate around, at various scales of coordination. I’m not sure precisely what the limits are, but it seems quite clear that the more people you are coordinating the harder it is to get them to share a nuanced model or strategy. It’s easier to have a nuanced strategy with 10 people than 100, 1000, or 10,000.
I’m less confident of the Working Memory hypothesis (it’s an armchair inside view based on my understanding of how working memory works)
I’m fairly confident in the Coordination Nuance Hypothesis, which is based on observations about how people actually seem to coordinate at various scales and how much nuance they seem to preserve.
In both cases, there are tools available to improve your ability to learn (as an individual), disseminate information (as a communicator), and keep people organized (as a leader). But none of the tools changed the fundamental equation, just the terms.
Anticipation Constraints:
The anticipation-constraint of the WMLH is “if you try to learn a concept that requires more than 7 chunks, you will fail. If a concept requires 12 chunks, you will not successfully learn it (or will learn a simplified bastardization of it) until you find a way to compress the 12 chunks into 7. If you have to do this yourself, it will take longer than if an educator has optimized it for you in advance.”
The anticipation constraint of the CNH is that if you try to coordinate with 100 people of a given level of intelligence, the shared complexity of the plan that you are enacting will be lower than the complexity of the plan you could enact with 10 people. If you try to implement a more complex plan or orient around a more complex model, your organization will make mistakes due to distorted simplifications of the plan. And this gets worse as your organizations scales.
CNH is still ambiguous between “nuanced plan” and “nuanced model” here, and those seem extremely different to me.
I agree they are different but think it is the case that with a larger group you have a harder time with either of them, for roughly the same reasons at roughly the same rate of increased difficulty.
The Working Memory Hypothesis says the Bell Labs is useful, in part, because whenever you need to combine multiple interdisciplinary concepts that are each complicated to invent a new concept…
instead of having to read a textbook that explains it one-particular-way (and, if it’s not your field, you’d need to get up to speed on the entire field in order to have any context at all) you can just walk down the hall and ask the guy who invented the concept “how does this work” and have them explain it to you multiple times until they find a way to compress it down into a 7 chunks, optimized for your current level of understanding.
A slightly more accurate anticipation of the CNH is:
people need to spend time learning a thing in order to coordinate around it. At the very least, the more time you need to spend getting people up to speed on a model, the less time they have to actually act on that model
people have idiosyncratic learning styles, and are going to misinterpret some bits of your plan, and you won’t know in advance which ones. Dealing with this requires individual attention, noticing their mistakes and correcting them. Middle managers (and middle “educators” can help to alleviate this, but every link in the chain reduces your control over what message gets distributed. If you need 10,000 people to all understand and act on the same plan/model, it needs to be simple or robust enough to survive 10,000 people misinterpreting it in slightly different ways
This gets even worse if you need to change your plan over time in response to new information, since now people are getting it confused with the old plan, or they don’t agree with the new plan because they signed up for the old plan, and then you have to Do Politics to get them on board with the new plan.
At the very least, if you’ve coordinated perfectly, each time you change your plan you need to shift from “focusing on execution” to “focusing on getting people up to speed on the new model.”