Chains, Bottlenecks and Optimization
Consider an idea consisting of a group of strongly connected sub-ideas. If any sub-idea is an error (doesn’t work), then the whole idea is an error (doesn’t work). We can metaphorically model this as a metal chain made of links. How strong is a chain? How hard can you pull on it before it breaks? It’s as strong as its weakest link. If you measure the strength of every link in the chain, and try to combine them into an overall strength score for the chain, you will get a bad answer. The appropriate weight to give the non-weakest links, in your analysis of chain strength, is ~zero.
There are special cases. Maybe the links are all equally strong to high precision. But that’s unusual. Variance (statistical fluctuations) is usual. Perhaps there is a bell curve of link strengths. Having two links approximately tied for weakest is more realistic, though still uncommon.
(A group of linked ideas may not be a chain (linear) because of branching (tree structure). But that doesn’t matter to my point. Stress the non-linear system of chain links and something will break first.)
The weakest link of the chain is the bottleneck or constraint. The other links have excess capacity – more strength than they need to stay unbroken when the chain gets pulled on hard enough to break the weakest link.
Optimization of non-bottlenecks is ~wasted effort. In other words, if you pick random chain links, and then you reinforce them, it (probably) doesn’t make the chain stronger. Reinforcing non-weakest links is misallocating effort.
So how good is an idea made of sub-ideas? It’s as strong as its weakest link (sub-idea). Most ideas have excess capacity. So it’d be a mistake to measure how good each sub-idea is, including more points for excess capacity, and then combine all the scores into an overall goodness score.
Excess capacity is a general feature and requirement of stable systems. Either most components have excess capacity or the system is unstable. Why? Because of variance. If lots of components were within the margin of error (max expected or common variance) of breaking, stuff would break all over the place on a regular basis. You’d have chaos. Stable systems mostly include parts which remain stable despite variance. That means that in most circumstances, when they aren’t currently dealing with high levels of negative variances, then they have excess capacity.
This is why manufacturing plants should not be designed as a balanced series of workstations, all with equal production capacity. A balanced plant (code) lacks excess capacity on any workstations (chain links), which makes it unstable to variance.
Abstractly, bottlenecks and excess capacity are key issues whenever there are dependency links plus variance. (Source.)
Applied to Software
This is similar to how, when optimizing computer programs for speed, you should look for bottlenecks and focus on improving those. Find the really slow part and work on that. Don’t just speed up any random piece of code. Most of the code is plenty fast. Which means, if you want to assign an overall optimization score to the code, it’d be misleading to look at how well optimized every function is and then average them. What you should actually do is a lot more like scoring the bottleneck(s) and ignoring how optimized the other functions are.
Just as optimizing the non-bottlenecks with lots of excess capacity would be wasted effort, any optimization already present at a non-bottleneck shouldn’t be counted when evaluating how optimized the codebase is, because it doesn’t matter. (To a reasonable approximation. Yes, as the code changes, the bottlenecks could move. A function could suddenly be called a million times more often than before and need optimizing. If it was pre-optimized, that’d be a benefit. But most functions will never become bottlenecks, so pre-optimizing just in case has a low value.)
Suppose a piece of software consists of one function which calls many sub-functions which call sub-sub-functions. How many speed bottlenecks does it have? Approximately one, just like a chain has one weakest link. In this case we’re adding up time taken by different components. The vast majority of sub-functions will be too fast to matter much. One or a small number of sub-functions use most of the time. So it’s a small number of bottlenecks but not necessarily one. (Note: there are never zero bottlenecks: no matter how much you speed stuff up, there will be a slowest sub-function. However, once the overall speed is fast enough, you can stop optimizing.) Software systems don’t necessarily have to be this way, but they usually are, and more balanced systems don’t work well.
Applied to Ideas
I propose viewing ideas from the perspective of chains with weakest links or bottlenecks. Focus on a few key issues. Don’t try to optimize the rest. Don’t update your beliefs using evidence, increasing your confidence in some ideas, when the evidence deals with non-bottlenecks. In other words, don’t add more plausibility to an idea when you improve a sub-component that already had excess capacity. Don’t evaluate the quality of all the components of an idea and combine them into a weighted average which comes out higher when there’s more excess capacity for non-bottlenecks.
BTW, what is excess capacity for an idea? Ideas have purposes. They’re meant to accomplish some goal such as solving a problem. Excess capacity means the idea is more than adequate to accomplish its purpose. The idea is more powerful than necessary to do its job. This lets it deal with variance, and may help with using the idea for other jobs.
Besides the relevance to adding up the weight of the evidence or arguments, this perspective explain why thinking is tractable in general: we’re able to focus our attention on a few key issues instead of being overwhelmed by the ~infinite complexity of reality (because most sub-issues we deal with have excess capacity, so they require little attention or optimization).
Note: In some ways, I have different background knowledge and perspective than the typical poster here (and in some ways I’m similar). I expect large inferential distance. I don’t expect my intended meaning to be transparent to readers here. (More links about this: one, two.) I hope to get feedback about which ideas people here accept, reject or want more elaboration on.
Acknowledgments: The ideas about chains, bottlenecks, etc., were developed by Eliyahu Goldratt, who developed the Theory of Constraints. He was known especially for applying the methods of the hard sciences to the field of business management. Above, I’ve summarized some Goldratt ideas and begun relating them to Bayesian epistemology.
I really like what this post is trying to do. The idea is a valuable one. But this explanation could use some work—not just because inferential distances are large, but because the presentation itself is too abstract to clearly communicate the intended point. In particular, I’d strongly recommend walking through at least 2-3 concrete examples of bottlenecks in ideas.
I replied at https://www.lesswrong.com/posts/Z66PxZnZJNSXzf8TL/bottleneck-examples
I think I’ve given away over 20 copies of _The Goal_ by Goldratt, and recommended it to coworkers hundreds of times. Thanks for the chance to recommend it again—it’s much more approachable than _Theory of Constraints_, and is more entertaining, while still conveying enough about his worldview to let you decide if you want the further precision and examples in his other books.
It’s important to recognize the limits of the chain metaphor—there is variance/uncertainty in the strength of a link (or capacity of a production step), and variance/uncertainty in alternate support for ideas (or alternate production paths). Most real-world situations are more of a mesh or a circuit than a linear chain, and the analysis of bottlenecks and risks is a fun multidimensional calculation of forces applies and propagated through multiple links.
Yes, even a factory process often isn’t a chain because e.g. a workstation may take inputs from multiple previous workstations and combine them.
Do you have a specific limit in mind re non-linear systems? I’m not clear on what the problem is.
The limit is on feasibility of mapping to most real-world situations, and complexity of calculation to determine how big a bottleneck in what conditions something is.
I replied at https://www.lesswrong.com/posts/qEkX5Ffxw5pb3JKzD/comment-replies-for-chains-bottlenecks-and-optimization
I think that ideas can have a bottleneck effect, but that isn’t the only effect. Some ideas have disjunctive justifications. You might have genetic evidence, and embryological evidence, and studies of bacteria evolving in test tubes, and a fossil record, all pointing towards the theory of evolution. In this case, the chains are all in parallel, so you can add their strength. Even if all the fossil record is completely discounted, you would still have good genetic evidence, and visa versa. If, conditional on the strongest line of evidence being totally discounted, you would still expect the other lines might hold, then the combination is stronger than the strongest.
If I have 10 arguments that X is true, and each argument is 50⁄50 (I assign equal probability to the argument being totally valid, and totally useless) And if these arguments are totally independant in their validity, then P(not X) < 1/1000. Note that total independance means that even when told with certainty that arguments 0 … 8 are nonsense, you still assign 50% prob to argument 9 being valid. This is a strong condition.
Disjunctive arguments are stronger than the strongest link.
On the other hand, Cojunctive arguments are weaker than the weakest link. If I have an argument that has 100 steps, and relies on all the steps holding for the argument to hold, then even if each step is 99% certain, the whole argument is only 37% certain.
For those of you looking for a direct numeric analogue to resistors, there isn’t one. They have similar structure, resistors with x+y and 1/x, probabilities with x*y and 1-x, but there isn’t a (nice) function from one to another. There isn’t f st f(x+y)=f(x)*f(y) and f(1/x)=1-f(x)
I replied at https://www.lesswrong.com/posts/qEkX5Ffxw5pb3JKzD/comment-replies-for-chains-bottlenecks-and-optimization
There is another case which your argument neglects, which can make weakest-link reasoning highly inaccurate, and which is less of a special case than a tie in link-strength.
The way you are reasoning about systems of interconnected ideas is conjunctive: every individual thing needs to be true. But some things are disjunctive: some one thing needs to be true. (Of course there are even more exotic logical connectives, such as implication or XOR, which are also used in everyday reasoning. But for now it will do to consider only conjunction and disjunction.)
A conjunction of a number of statements is—at most—as strong as its weakest element, as you suggest. However, a disjunction of a number of statements is—at worst—as strong as its strongest element.
Following the venerated method of multiple working hypotheses, then, we are well-advised to come up with as many hypotheses as we can to explain the data. Doing this has many advantages, but one worth mentioning is that it increases the probability that one of our ideas is right.
Since we don’t want to just believe everything, we then have to somehow weigh the different hypotheses against each other. Our method of weighing hypotheses should ideally have the property that, if the correct hypothesis is in our collection, we would eventually converge to it. (Having established that, we might ask to converge as quickly as possible; and we could also name many other desirable properties of such a weighting scheme.)
Concerning such weighting schemes, you comment:
I can agree that a weighted average is not what is called for in a conjunctive case. It is closer to correct in a disjunctive case. But let us focus on your claim that we should ignore non-bottlenecks. You already praised ideas for having excess capacity, IE, being more accurate than needed for a given application. But now you are criticizing a practice of weighing evidence too accurately.
I previously agreed that a conjunction has strength at most that of its weakest element. But a probabilistic account gives us more accuracy than that, allowing us to compute a more precise strength (if we have need of it). And similarly, a disjunction has strength at least that of its weakest element—but a probabilistic account gives us more accuracy than that, if we need it. Quoting you:
Perhaps the excess accuracy in probability theory makes it more powerful than necessary to do its job? Perhaps this helps it deal with variance? Perhaps it helps the idea apply for other jobs than the one it was meant for?
I replied at https://www.lesswrong.com/posts/qEkX5Ffxw5pb3JKzD/comment-replies-for-chains-bottlenecks-and-optimization
What does that even mean? A claim or proposition cannot have a probability over 1.0. If “capacity” is not analogous to probability, what is it analogous too?
I just posted this, which I think will answer your questions. Let me know if not.
https://www.lesswrong.com/posts/Z66PxZnZJNSXzf8TL/bottleneck-examples