The post creates unnecessary confusion by lumping “momentum” , “exponential growth”, “compound interest”, and “heavy tail distributions”. Conflating these concepts together on system-1 level into some vague undifferentiated positive mess is likely harmful for to anyone aspiring to think about systems clearly.
It seems like it’s pretty consistently talking about attachment style effects, do you have an example of where it conflates that causal mechanism with something else? (I.e. it pretty consistently talks about the phenomenon “having more of X gives you even more of X” which can happen for a variety of reasons, but seems like a common enough phenomena to have a common abstraction for)
I don’t know what you mean by attachment style, but some examples of the conflation...
Momentum is this: even if JK Rowling’s next book is total crap, it will still sell a lot of copies. Because people have beliefs, and because they enjoyed her previous books, they have a prior that they will enjoy also the next one. It would take them several crap book to update.
Power laws are ubiquitous. This should be unsurprising—power laws are the simplest functional form in the logarithmic picture. If we use some sort of simplicity prior, we are guaranteed to find them. If we use first terms of Taylor expansion, we will find them. Log picture is as natural as the linear one. Someone should write a Meditation on Benford’s law—you have an asymptotically straight line in log-log picture of the probability than a number starts with some digits (in almost any real-life set of numerical values measured in units; you can see this must be the case because of invariance to unit scaling)
This is maybe worth emphasizing: nobody should be surprised to find power laws. Nobody should propose universal causal mechanism for power laws, it is as stupid as proposing one causal mechanism for straight lines in linear picture.
They are often the result of other power-law distributed quantities. To take one example from the op… initial distribution of masses for an initail population of new stars is a truncated power law. I don’t know why, but the proposed mechanisms for this is for example turbulent fragmentation of the initial cloud, where the power law can come from the power spectrum of super–sonic turbulence.
Sure, but where does the post talk about power laws? It only talks about momentum-like effects, and there is a large literature on how preferential attachment can give rise to power-laws, so it has some relation, but I don’t see the article talking about power-laws in isolation. I brought up power-laws in the curation notice, because they frequently show up in situations with increasing marginal returns and multiplicative feedback loops, but I don’t see how the article encourages any kind of conflation in this area given that it practically makes no reference to it.
I also think you vastly overstate the case for power-laws. Most power-laws are much better fit by a log-normal distribution. See this paper for more details. I think the prior for log-normal should be higher than the prior for power-law, because of the multiplicative equivalent of the central limit theorem.
I’m still confused what you mean by momentum-like effects. Momentum is a very beautiful and crisp concept - the dual (canonical conjugate) of position, with all kinds of deep connections to everything. You can view the whole universe in the dual momentum space.
If the intention is to have a concept ca in the shape of “all kinds of dynamics which can be rounded to dx=a.x” I agree it may be valuable to have a word for that, but why overload momentum?
You asked for an example of where it conflates that causal mechanism with something else. I picked one example from this paragraph
So, as I understand it, I gave you an example (the distribution of star masses) which quite likely does not have any useful connection to preferential attachment or exponential grow. I’m really confused after your last reply what is the state of our disagreement on this.
I’m actually scared to change the topic of the discussion what simplicity means, but the argument is roughly this: if you have arbitrary well behaved function, in the linear picture, you can approximate it locally by a straight line (the first term in the Taylor series, etc.). And yes, you get better approximation by including more terms from the Taylor series expansion, or by non-linear regression, etc. Now, if you translate this to the log-log picture, you will find out that power law is in some sense the simplest local approximation of anything. This is also the reason why people often mistakenly use power laws instead of lognormal and other distributions—if you truncate the lognormal and look just on part of the tail you can fit it with power law. Btw you nicely demonstrate this effect yourself—preferential attachment often actually leads to Yule-Simon distribution, and not a power law … but as usually you can approximate it.
Oh, I assumed the author was referring to this explanation for the distribution of star masses:
Here we propose a new approach exploiting the techniques from the field of network science. We represent a system of dense cores accreting gas from the surrounding diffuse interstellar medium (ISM) as a spatial network growing by preferential attachment and assume that the ISM density has a self-similar fractal distribution following the Kolmogorov turbulence theory. We effectively combine gravoturbulent and competitive accretion approaches and predict the accretion rate to be proportional to the dense core mass: dM/dt∝M. Then we describe the dense core growth and demonstrate that the power-law core mass function emerges independently of the initial distribution of density fluctuations by mass. Our model yields a power law solely defined by the fractal dimensionalities of the ISM and accreting gas. With a proper choice of the low-mass cut-off, it reproduces observations over three decades in mass. We also rule out a low-mass star dominated “bottom-heavy” IMF in a single star-forming region.
I agree that if this is indeed the case, the author should provide a direct link to this theory, and ideally mention it explicitly as a theory among many.
I actually think the theory linked above is likely to be wrong, but don’t have any similar senses for all the other links provided in the same paragraph, which seem to me to pretty robustly be systems in which preferential attachment plays a large role.
I think the case of the star-distribution is more likely to be an honest error where the author heard about the preferential attachment theory for the distribution of star-sizes somewhere else, and used it as an example here even though it probably isn’t the most elegant explanation of the phenomena, which I do think should be pointed out.
I agree that if the author wanted to imply that “all power-law distributions are the result of momentum as defined in this article” then that would be bad, but I think the author overwhelmingly used examples that point towards a much narrower set of phenomena that also happen to produce things that look like power-laws (which I agree with you should appear for a lot of different reasons and should not be thought to be much evidence for any specific underlying causal model).
1. Going through two of the adjacent links in the same paragraph:
With the trees, I only skimmed it, but if I get it correctly, the linked article proposes this new hypothesis: Together these pieces of evidence point to a new hypothesis: Small-scale, gap-generating disturbances maintain power-function size structure whereas later-successional forest patches are responsible for deviations in the high tail.
and, also from the paper
Current theories explaining the consistency of tropical forest size structure are controversial. Explanations based on scaling up individual metabolic rates are criticized for ignoring the importance of asymmetric competition for light in causing variation in dynamic rates. Other theories, which embrace competition and scale individual tree vital rates through an assumption of demographic equilibrium, are criticized for lacking parsimony, because predictions rely on site-level, size-specific parameterization
(I also recommend looking on the plots with the “power law”, which are of the usual type of approximating something more complex with a straight line in some interval.)
So, what we actually have in this: apparently different researchers proposing different hypothesis to explain the observed power-law-like data. It is far from conclusive what the actual reason is. As something like positive feedback loops is quite obvious part of the hypothesis space if you see power-law-like data, you are almost guaranteed to find a paper which proposes something in that direction. However, note that article actually criticizes previous explanations based more on “Matthews effect”, and proposes disturbances as a critical part of the explanation.
(Btw I do not claim any dishonesty from the author anything like that.)
Something similar can be said about the Cambrian explosion which is the next link.
Halo and Horn effects are likely evolutionary adaptive effects, tracking something real (traits like “having an ugly face” and “having higher probability of ending up in trouble” are likely correlated—the common cause can be mutation load / parasite load; you have things like the positive manifold).
And so on.
Sorry but I will not dissect every paragraph of the article in this way. (Also it seems a bit futile, as if I dig into specific examples, it will be interpreted as nit-picking)
2. Last attempt to gesture toward whats wrong with this whole. The best approximation of the cluster of phenomena the article is pointing toward is not “preferential attachment” (as you propose), but something broader—“systems with feedback loops which can be in some approximation described by the differential equation dx = b.x”.
You can start to see systems like that everywhere, and get a sense of something deep, explaining life, universe and everything.
One problem with this: if you have a system described by a differential equation of the form “dx = f(x,..)”, and the function f() is reasonable, you can approximate it by its Taylor series “f(x)=a+b.x+c.x.x+..”. Obviously, the first order term is b.x. Unfortunately (?) you can say this even before looking on the system.
So, vaguely speaking, when you start thinking in this way, my intuition is it puts you in a big danger of conflating something about how you do approximations with causal explanations. (I guess it may be a good deal for many people who don’t have s-1 intuitions for Taylor series or even log() function)
I actually had some similar alarm bells go off for conflation of concepts in the op, especially because the post specifically gestures at one concept and doesn’t give explanations of the different examples where this might come up.
However, on second thought I think I do like the concept this builds. To phrase it in your formal terms, I think it’s very useful to notice all the systems in which the Taylor series for f has b>0, ESPECIALLY when it’s comparably easy to control f via b∗x rather than just a.
In this light, you can view momentum, exponential growth, heavy-tails, etc., as all cases where a main component of controlling or predicting future x is by paying attention to the b∗x term, and I claim this is an important revelation to have at a variety of levels.
Perhaps more relevant to your actual crux, I also get shudders when people overload physics terms with other meanings, but before they were physics terms they were concepts for intuitive things. Given that we view the world through physical metaphors, I think it’s quite important for us to use the best-fitting words for concepts. Then we can remind people of the different variants when people run into conflationary trouble. If we start off by naming things with poor associations we hold ourselves back more. If you have alternative name to “momentum” for this that you also think have good connotations though, I’d love to hear them.
The second thing first: ”...but before they were physics terms they were concepts for intuitive things” is actually not true in this case: momentum did not mean anything, before being coined in physics. Than, it become used in a metaphorical way, but mostly congruently with the original physics concepts, as something like “mass”x”velocity”. It seems to me easy to imagine vivid pictures based of this metaphor, like advancing army conquering mile after mile of enemy territory having a momentum, or a scholar going through page after page of a difficult text. However, this concept is not tied to the b∗x term (which is one of my cruxes).
To me, the original metaphorical meaning of momentum makes a lot of sense: you have a lot of systems where you have something like mass (closely connected to inertia: you need great force to get something massive to move) and something like velocity—direction and speed where the system is heading. I would expect most people have this on some level.
Now, to the first thing second: I agree that it may be useful to notice all the systems in which the Taylor series for f has b>0, ESPECIALLY when it’s comparably easy to control f via b∗x rather than just a. However, some of the examples in the original post do not match this pattern: some could be just systems where, for example, you insert heavy-tailed distribution on the input, and you get heavy-tailed distribution on the output, or systems where the a term is what you should control, or systems where you should actually understand more about f(x) than the fact that is is has positive first derivative at some point.
What should be a good name for b∗x>0 I don’t know, some random prosaic ideas are snowballing, compound, faenus (from latin interest on money, gains, profit, advantage), compound interest. But likely there are is some more poetic name, similarly to Moloch.
The post creates unnecessary confusion by lumping “momentum” , “exponential growth”, “compound interest”, and “heavy tail distributions”. Conflating these concepts together on system-1 level into some vague undifferentiated positive mess is likely harmful for to anyone aspiring to think about systems clearly.
It seems like it’s pretty consistently talking about attachment style effects, do you have an example of where it conflates that causal mechanism with something else? (I.e. it pretty consistently talks about the phenomenon “having more of X gives you even more of X” which can happen for a variety of reasons, but seems like a common enough phenomena to have a common abstraction for)
I don’t know what you mean by attachment style, but some examples of the conflation...
Momentum is this: even if JK Rowling’s next book is total crap, it will still sell a lot of copies. Because people have beliefs, and because they enjoyed her previous books, they have a prior that they will enjoy also the next one. It would take them several crap book to update.
Power laws are ubiquitous. This should be unsurprising—power laws are the simplest functional form in the logarithmic picture. If we use some sort of simplicity prior, we are guaranteed to find them. If we use first terms of Taylor expansion, we will find them. Log picture is as natural as the linear one. Someone should write a Meditation on Benford’s law—you have an asymptotically straight line in log-log picture of the probability than a number starts with some digits (in almost any real-life set of numerical values measured in units; you can see this must be the case because of invariance to unit scaling)
This is maybe worth emphasizing: nobody should be surprised to find power laws. Nobody should propose universal causal mechanism for power laws, it is as stupid as proposing one causal mechanism for straight lines in linear picture.
They are often the result of other power-law distributed quantities. To take one example from the op… initial distribution of masses for an initail population of new stars is a truncated power law. I don’t know why, but the proposed mechanisms for this is for example turbulent fragmentation of the initial cloud, where the power law can come from the power spectrum of super–sonic turbulence.
Sure, but where does the post talk about power laws? It only talks about momentum-like effects, and there is a large literature on how preferential attachment can give rise to power-laws, so it has some relation, but I don’t see the article talking about power-laws in isolation. I brought up power-laws in the curation notice, because they frequently show up in situations with increasing marginal returns and multiplicative feedback loops, but I don’t see how the article encourages any kind of conflation in this area given that it practically makes no reference to it.
I also think you vastly overstate the case for power-laws. Most power-laws are much better fit by a log-normal distribution. See this paper for more details. I think the prior for log-normal should be higher than the prior for power-law, because of the multiplicative equivalent of the central limit theorem.
I’m still confused what you mean by momentum-like effects. Momentum is a very beautiful and crisp concept - the dual (canonical conjugate) of position, with all kinds of deep connections to everything. You can view the whole universe in the dual momentum space.
If the intention is to have a concept ca in the shape of “all kinds of dynamics which can be rounded to dx=a.x” I agree it may be valuable to have a word for that, but why overload momentum?
You asked for an example of where it conflates that causal mechanism with something else. I picked one example from this paragraph
So, as I understand it, I gave you an example (the distribution of star masses) which quite likely does not have any useful connection to preferential attachment or exponential grow. I’m really confused after your last reply what is the state of our disagreement on this.
I’m actually scared to change the topic of the discussion what simplicity means, but the argument is roughly this: if you have arbitrary well behaved function, in the linear picture, you can approximate it locally by a straight line (the first term in the Taylor series, etc.). And yes, you get better approximation by including more terms from the Taylor series expansion, or by non-linear regression, etc. Now, if you translate this to the log-log picture, you will find out that power law is in some sense the simplest local approximation of anything. This is also the reason why people often mistakenly use power laws instead of lognormal and other distributions—if you truncate the lognormal and look just on part of the tail you can fit it with power law. Btw you nicely demonstrate this effect yourself—preferential attachment often actually leads to Yule-Simon distribution, and not a power law … but as usually you can approximate it.
Oh, I assumed the author was referring to this explanation for the distribution of star masses:
I agree that if this is indeed the case, the author should provide a direct link to this theory, and ideally mention it explicitly as a theory among many.
I actually think the theory linked above is likely to be wrong, but don’t have any similar senses for all the other links provided in the same paragraph, which seem to me to pretty robustly be systems in which preferential attachment plays a large role.
I think the case of the star-distribution is more likely to be an honest error where the author heard about the preferential attachment theory for the distribution of star-sizes somewhere else, and used it as an example here even though it probably isn’t the most elegant explanation of the phenomena, which I do think should be pointed out.
I agree that if the author wanted to imply that “all power-law distributions are the result of momentum as defined in this article” then that would be bad, but I think the author overwhelmingly used examples that point towards a much narrower set of phenomena that also happen to produce things that look like power-laws (which I agree with you should appear for a lot of different reasons and should not be thought to be much evidence for any specific underlying causal model).
1. Going through two of the adjacent links in the same paragraph:
With the trees, I only skimmed it, but if I get it correctly, the linked article proposes this new hypothesis: Together these pieces of evidence point to a new hypothesis: Small-scale, gap-generating disturbances maintain power-function size structure whereas later-successional forest patches are responsible for deviations in the high tail.
and, also from the paper
Current theories explaining the consistency of tropical forest size structure are controversial. Explanations based on scaling up individual metabolic rates are criticized for ignoring the importance of asymmetric competition for light in causing variation in dynamic rates. Other theories, which embrace competition and scale individual tree vital rates through an assumption of demographic equilibrium, are criticized for lacking parsimony, because predictions rely on site-level, size-specific parameterization
(I also recommend looking on the plots with the “power law”, which are of the usual type of approximating something more complex with a straight line in some interval.)
So, what we actually have in this: apparently different researchers proposing different hypothesis to explain the observed power-law-like data. It is far from conclusive what the actual reason is. As something like positive feedback loops is quite obvious part of the hypothesis space if you see power-law-like data, you are almost guaranteed to find a paper which proposes something in that direction. However, note that article actually criticizes previous explanations based more on “Matthews effect”, and proposes disturbances as a critical part of the explanation.
(Btw I do not claim any dishonesty from the author anything like that.)
Something similar can be said about the Cambrian explosion which is the next link.
Halo and Horn effects are likely evolutionary adaptive effects, tracking something real (traits like “having an ugly face” and “having higher probability of ending up in trouble” are likely correlated—the common cause can be mutation load / parasite load; you have things like the positive manifold).
And so on.
Sorry but I will not dissect every paragraph of the article in this way. (Also it seems a bit futile, as if I dig into specific examples, it will be interpreted as nit-picking)
2. Last attempt to gesture toward whats wrong with this whole. The best approximation of the cluster of phenomena the article is pointing toward is not “preferential attachment” (as you propose), but something broader—“systems with feedback loops which can be in some approximation described by the differential equation dx = b.x”.
You can start to see systems like that everywhere, and get a sense of something deep, explaining life, universe and everything.
One problem with this: if you have a system described by a differential equation of the form “dx = f(x,..)”, and the function f() is reasonable, you can approximate it by its Taylor series “f(x)=a+b.x+c.x.x+..”. Obviously, the first order term is b.x. Unfortunately (?) you can say this even before looking on the system.
So, vaguely speaking, when you start thinking in this way, my intuition is it puts you in a big danger of conflating something about how you do approximations with causal explanations. (I guess it may be a good deal for many people who don’t have s-1 intuitions for Taylor series or even log() function)
I actually had some similar alarm bells go off for conflation of concepts in the op, especially because the post specifically gestures at one concept and doesn’t give explanations of the different examples where this might come up.
However, on second thought I think I do like the concept this builds. To phrase it in your formal terms, I think it’s very useful to notice all the systems in which the Taylor series for f has b>0, ESPECIALLY when it’s comparably easy to control f via b∗x rather than just a.
In this light, you can view momentum, exponential growth, heavy-tails, etc., as all cases where a main component of controlling or predicting future x is by paying attention to the b∗x term, and I claim this is an important revelation to have at a variety of levels.
Perhaps more relevant to your actual crux, I also get shudders when people overload physics terms with other meanings, but before they were physics terms they were concepts for intuitive things. Given that we view the world through physical metaphors, I think it’s quite important for us to use the best-fitting words for concepts. Then we can remind people of the different variants when people run into conflationary trouble. If we start off by naming things with poor associations we hold ourselves back more. If you have alternative name to “momentum” for this that you also think have good connotations though, I’d love to hear them.
The second thing first: ”...but before they were physics terms they were concepts for intuitive things” is actually not true in this case: momentum did not mean anything, before being coined in physics. Than, it become used in a metaphorical way, but mostly congruently with the original physics concepts, as something like “mass”x”velocity”. It seems to me easy to imagine vivid pictures based of this metaphor, like advancing army conquering mile after mile of enemy territory having a momentum, or a scholar going through page after page of a difficult text. However, this concept is not tied to the b∗x term (which is one of my cruxes).
To me, the original metaphorical meaning of momentum makes a lot of sense: you have a lot of systems where you have something like mass (closely connected to inertia: you need great force to get something massive to move) and something like velocity—direction and speed where the system is heading. I would expect most people have this on some level.
Now, to the first thing second: I agree that it may be useful to notice all the systems in which the Taylor series for f has b>0, ESPECIALLY when it’s comparably easy to control f via b∗x rather than just a. However, some of the examples in the original post do not match this pattern: some could be just systems where, for example, you insert heavy-tailed distribution on the input, and you get heavy-tailed distribution on the output, or systems where the a term is what you should control, or systems where you should actually understand more about f(x) than the fact that is is has positive first derivative at some point.
What should be a good name for b∗x>0 I don’t know, some random prosaic ideas are snowballing, compound, faenus (from latin interest on money, gains, profit, advantage), compound interest. But likely there are is some more poetic name, similarly to Moloch.