I used to consider it a mystery that math was so unreasonably effective in the natural sciences, but changed my mind after reading this essay by Eric S. Raymond (who’s here on the forum, hi and thanks Eric), in particular this part, which is as good a question dissolution as any I’ve seen:
The relationship between mathematical models and phenomenal prediction is complicated, not just in practice but in principle. Much more complicated because, as we now know, there are mutually exclusive ways to axiomatize mathematics! It can be diagrammed as follows (thanks to Jesse Perry for supplying the original of this chart):
(it’s a shame this chart isn’t rendering properly for some reason, since without it the rest of Eric’s quote is ~incomprehensible)
The key transactions for our purposes are C and D—the translations between a predictive model and a mathematical formalism. What mystified Einstein is how often D leads to new insights.
We begin to get some handle on the problem if we phrase it more precisely; that is, “Why does a good choice of C so often yield new knowledge via D?”
The simplest answer is to invert the question and treat it as a definition. A “good choice of C” is one which leads to new predictions. The choice of C is not one that can be made a-priori; one has to choose, empirically, a mapping between real and mathematical objects, then evaluate that mapping by seeing if it predicts well.
One can argue that it only makes sense to marvel at the utility of mathematics if one assumes that C for any phenomenal system is an a-priori given. But we’ve seen that it is not. A physicist who marvels at the applicability of mathematics has forgotten or ignored the complexity of C; he is really being puzzled at the human ability to choose appropriate mathematical models empirically.
By reformulating the question this way, we’ve slain half the dragon. Human beings are clever, persistent apes who like to play with ideas. If a mathematical formalism can be found to fit a phenomenal system, some human will eventually find it. And the discovery will come to look “inevitable” because those who tried and failed will generally be forgotten.
But there is a deeper question behind this: why do good choices of mathematical model exist at all? That is, why is there any mathematical formalism for, say, quantum mechanics which is so productive that it actually predicts the discovery of observable new particles?
The way to “answer” this question is by observing that it, too, properly serves as a kind of definition. There are many phenomenal systems for which no such exact predictive formalism has been found, nor for which one seems likely. Poets like to mumble about the human heart, but more mundane examples are available. The weather, or the behavior of any economy larger than village size, for example—systems so chaotically interdependent that exact prediction is effectively impossible (not just in fact but in principle).
There are many things for which mathematical modeling leads at best to fuzzy, contingent, statistical results and never successfully predicts ‘new entities’ at all. In fact, such systems are the rule, not the exception. So the proper answer to the question “Why is mathematics is so marvelously applicable to my science?” is simply “Because that’s the kind of science you’ve chosen to study!”
Interesting. This reminds me of a related thought I had: Why do models with differential equations work so often in physics but so rarely in other empirical sciences? Perhaps physics simply is “the differential equation science”.
Which is also related to the frequently expressed opinion that philosophy makes little progress because everything that gets developed enough to make significant progress splits off from philosophy. Because philosophy is “the study of ill-defined and intractable problems”.
Not saying that I think these views are accurate, though they do have some plausibility.
The weather, or the behavior of any economy larger than village size, for example—systems so chaotically interdependent that exact prediction is effectively impossible (not just in fact but in principle).
Flagging that those two examples seem false. The weather is chaotic, yes, and there’s a sense in which the economy is anti-inductive, but modeling methods are advancing, and will likely find more loop-holes in chaos theory.
For example, in thermodynamics, temperature is non-chaotic while the precise kinetic energies and locations of all particles are. A reasonable candidate similarity in weather are hurricanes.
Similarly as our understanding of the economy advances it will get more efficient which means it will be easier to model. eg (note: I’ve only skimmed this paper). And definitely large economies are even more predictable than small villages, talk about not having a competitive market!
Thanks for the pointer to that paper, the abstract makes me think there’s a sort of slow-acting self-reinforcing feedback loop between predictive error minimisation via improving modelling and via improving the economy itself.
re: weather, I’m thinking of the chart below showing how little gain we get in MAE vs compute, plus my guess that compute can’t keep growing far enough to get MAE < 3 °F a year out (say). I don’t know anything about advancements in weather modelling methods though; maybe effective compute (incorporating modelling advancements) may grow indefinitely in terms of the chart.
I didn’t say anything about temperature prediction, and I’d also like to see any other method (intuition based or otherwise) do better than the current best mathematical models here. It seems unlikely to me that the trends in that graph will continue arbitrarily far.
Thanks for the pointer to that paper, the abstract makes me think there’s a sort of slow-acting self-reinforcing feedback loop between predictive error minimisation via improving modelling and via improving the economy itself.
I would also comment that, if the environment was so chaotic that roughly everything important to life could not be modeled—if general-purpose modeling ability was basically useless—then life would not have evolved that ability, and “intelligent life” probably wouldn’t exist.
The two concepts that I thought were missing from Eliezer’s technical explanation of technical explanation that would have simplified some of the explanation were compression and degrees of freedom. Degrees of freedom seems very relevant here in terms of how we map between different representations. Why are representations so important for humans? Because they have different computational properties/traversal costs while humans are very computationally limited.
Griffiths argued that the aspects we associate with human intelligence – rapid learning from small data, the ability to break down problems into parts, and the capacity for cumulative cultural evolution – arose from the 3 fundamental limitations all humans share: limited time, limited computation, and limited communication. (The constraints imposed by these characteristics cascade: limited time magnifies the effect of limited computation, and limited communication makes it harder to draw upon more computation.) In particular, limited computation leads to problem decomposition, hence modular solutions; relieving the computation constraint enables solutions that can be objectively better along some axis while also being incomprehensible to humans.
Thanks for the link. I mean that predictions are outputs of a process that includes a representation, so part of what’s getting passed back and forth in the diagram are better and worse fit representations. The degrees of freedom point is that we choose very flexible representations, whittle them down with the actual data available, then get surprised that that representation yields other good predictions. But we should expect this if Nature shares any modular structure with our perception at all, which it would if there was both structural reasons (literally same substrate) and evolutionary pressure for representations with good computational properties i.e. simple isomorphisms and compressions.
Matt Leifer, who works in quantum foundations, espouses a view that’s probably more extreme than Eric Raymond’s above to argue why the effectiveness of math in the natural sciences isn’t just reasonable but expected-by-construction. In his 2015 FQXi essay Mathematics is Physics Matt argued that
… mathematics is a natural science—just like physics, chemistry, or biology—and that this can explain the alleged “unreasonable” effectiveness of mathematics in the physical sciences.
The main challenge for this view is to explain how mathematical theories can become increasingly abstract and develop their own internal structure, whilst still maintaining an appropriate empirical tether that can explain their later use in physics. In order to address this, I offer a theory of mathematical theory-building based on the idea that human knowledge has the structure of a scale-free network and that abstract mathematical theories arise from a repeated process of replacing strong analogies with new hubs in this network.
This allows mathematics to be seen as the study of regularities, within regularities, within . . . , within regularities of the natural world. Since mathematical theories are derived from the natural world, albeit at a much higher level of abstraction than most other scientific theories, it should come as no surprise that they so often show up in physics.
… mathematical objects do not refer directly to things that exist in the physical universe. As the formalists suggest, mathematical theories are just abstract formal systems, but not all formal systems are mathematics. Instead, mathematical theories are those formal systems that maintain a tether to empirical reality through a process of abstraction and generalization from more empirically grounded theories, aimed at achieving a pragmatically useful representation of regularities that exist in nature.
(Matt notes as an aside that he’s arguing for precisely the opposite of Tegmark’s MUH.)
Why “scale-free network”?
It is common to view the structure of human knowledge as hierarchical… The various attempts to reduce all of mathematics to logic or arithmetic reflect a desire view mathematical knowledge as hanging hierarchically from a common foundation. However, the fact that mathematics now has multiple competing foundations, in terms of logic, set theory or category theory, indicates that something is wrong with this view.
Instead of a hierarchy, we are going to attempt to characterize the structure of human knowledge in terms of a network consisting of nodes with links between them… Roughly speaking, the nodes are supposed to represent different fields of study. This could be done at various levels of detail. … Next, a link should be drawn between two nodes if there is a strong connection between the things they represent. Again, I do not want to be too precise about what this connection should be, but examples would include an idea being part of a wider theory, that one thing can be derived from the other, or that there exists a strong direct analogy between the two nodes. Essentially, if it has occurred to a human being that the two things are strongly related, e.g. if it has been thought interesting enough to do something like publish an academic paper on the connection, and the connection has not yet been explained in terms of some intermediary theory, then there should be a link between the corresponding nodes in the network.
If we imagine drawing this network for all of human knowledge then it is plausible that it would have the structure of a scale-free network. Without going into technical details, scale-free networks have a small number of hubs, which are nodes that are linked to a much larger number of nodes than the average. This is a bit like the 1% of billionaires who are much richer than the rest of the human population. If the knowledge network is scale-free then this would explain why it seems so plausible that knowledge is hierarchical. In a university degree one typically learns a great deal about one of the hubs, e.g. the hub representing fundamental physics, and a little about some of the more specialized subjects that hang from it. As we get ever more specialized, we typically move away from our starting hub towards more obscure nodes, which are nonetheless still much closer to the starting hub than to any other hub. The local part of the network that we know about looks much like a hierarchy, and so it is not surprising that physicists end up thinking that everything boils down to physics whereas sociologists end up thinking that everything is a social construct. In reality, neither of these views is right because the global structure of the network is not a hierarchy.
As a naturalist, I should provide empirical evidence that human knowledge is indeed structured as a scale-free network. The best evidence that I can offer is that the structure of pages and links on the Word Wide Web and the network of citations to academic papers are both scale free [13]. These are, at best, approximations of the true knowledge network. … However, I think that these examples provide evidence that the information structures generated by a social network of finite beings are typically scale-free networks, and the knowledge network is an example of such a structure.
As an aside, Matt’s theory of theory-building explains (so he claims) what mathematical intuition is about: “intuition for efficient knowledge structure, rather than intuition about an abstract mathematical world”.
So what? How does this view pay rent?
Firstly, in network language, the concept of a “theory of everything” corresponds to a network with one enormous hub, from which all other human knowledge hangs via links that mean “can be derived from”. This represents a hierarchical view of knowledge, which seems unlikely to be true if the structure of human knowledge is generated by a social process. It is not impossible for a scale-free network to have a hierarchical structure like a branching tree, but it seems unlikely that the process of knowledge growth would lead uniquely to such a structure. It seems more likely that we will always have several competing large hubs and that some aspects of human experience, such as consciousness and why we experience a unique present moment of time, will be forever outside the scope of physics.
Nonetheless, my theory suggests that the project of finding higher level connections that encompass more of human knowledge is still a fruitful one. It prevents our network from having an unwieldy number of direct links, allows us to share more common vocabulary between fields, and allows an individual to understand more of the world with fewer theories. Thus, the search for a theory of everything is not fruitless; I just do not expect it to ever terminate.
Secondly, my theory predicts that the mathematical representation of fundamental physical theories will continue to become increasingly abstract. The more phenomena we try to encompass in our fundamental theories, the further the resulting hubs will be from the nodes representing our direct sensory experience. Thus, we should not expect future theories of physics to become less mathematical, as they are generated by the same process of generalization and abstraction as mathematics itself.
Matt further develops the argument that the structure of human knowledge being networked-not-hierarchical implies that the idea that there is a most fundamental discipline, or level of reality, is mistaken in Against Fundamentalism, another FQXi essay published in 2018.
I used to consider it a mystery that math was so unreasonably effective in the natural sciences, but changed my mind after reading this essay by Eric S. Raymond (who’s here on the forum, hi and thanks Eric), in particular this part, which is as good a question dissolution as any I’ve seen:
(it’s a shame this chart isn’t rendering properly for some reason, since without it the rest of Eric’s quote is ~incomprehensible)
I also think I was intuition-pumped to buy Eric’s argument by Julie Moronuki’s beautiful meandering essay The Unreasonable Effectiveness of Metaphor.
Interesting. This reminds me of a related thought I had: Why do models with differential equations work so often in physics but so rarely in other empirical sciences? Perhaps physics simply is “the differential equation science”.
Which is also related to the frequently expressed opinion that philosophy makes little progress because everything that gets developed enough to make significant progress splits off from philosophy. Because philosophy is “the study of ill-defined and intractable problems”.
Not saying that I think these views are accurate, though they do have some plausibility.
(To be honest, to first approximation my guess mirrors yours.)
Flagging that those two examples seem false. The weather is chaotic, yes, and there’s a sense in which the economy is anti-inductive, but modeling methods are advancing, and will likely find more loop-holes in chaos theory.
For example, in thermodynamics, temperature is non-chaotic while the precise kinetic energies and locations of all particles are. A reasonable candidate similarity in weather are hurricanes.
Similarly as our understanding of the economy advances it will get more efficient which means it will be easier to model. eg (note: I’ve only skimmed this paper). And definitely large economies are even more predictable than small villages, talk about not having a competitive market!
Thanks for the pointer to that paper, the abstract makes me think there’s a sort of slow-acting self-reinforcing feedback loop between predictive error minimisation via improving modelling and via improving the economy itself.
re: weather, I’m thinking of the chart below showing how little gain we get in MAE vs compute, plus my guess that compute can’t keep growing far enough to get MAE < 3 °F a year out (say). I don’t know anything about advancements in weather modelling methods though; maybe effective compute (incorporating modelling advancements) may grow indefinitely in terms of the chart.
I didn’t say anything about temperature prediction, and I’d also like to see any other method (intuition based or otherwise) do better than the current best mathematical models here. It seems unlikely to me that the trends in that graph will continue arbitrarily far.
Yeah, that was my claim.
I would also comment that, if the environment was so chaotic that roughly everything important to life could not be modeled—if general-purpose modeling ability was basically useless—then life would not have evolved that ability, and “intelligent life” probably wouldn’t exist.
The two concepts that I thought were missing from Eliezer’s technical explanation of technical explanation that would have simplified some of the explanation were compression and degrees of freedom. Degrees of freedom seems very relevant here in terms of how we map between different representations. Why are representations so important for humans? Because they have different computational properties/traversal costs while humans are very computationally limited.
Can you say more about what you mean? Your comment reminded me of Thomas Griffiths’ paper Understanding Human Intelligence through Human Limitations, but you may have meant something else entirely.
Griffiths argued that the aspects we associate with human intelligence – rapid learning from small data, the ability to break down problems into parts, and the capacity for cumulative cultural evolution – arose from the 3 fundamental limitations all humans share: limited time, limited computation, and limited communication. (The constraints imposed by these characteristics cascade: limited time magnifies the effect of limited computation, and limited communication makes it harder to draw upon more computation.) In particular, limited computation leads to problem decomposition, hence modular solutions; relieving the computation constraint enables solutions that can be objectively better along some axis while also being incomprehensible to humans.
Thanks for the link. I mean that predictions are outputs of a process that includes a representation, so part of what’s getting passed back and forth in the diagram are better and worse fit representations. The degrees of freedom point is that we choose very flexible representations, whittle them down with the actual data available, then get surprised that that representation yields other good predictions. But we should expect this if Nature shares any modular structure with our perception at all, which it would if there was both structural reasons (literally same substrate) and evolutionary pressure for representations with good computational properties i.e. simple isomorphisms and compressions.
Matt Leifer, who works in quantum foundations, espouses a view that’s probably more extreme than Eric Raymond’s above to argue why the effectiveness of math in the natural sciences isn’t just reasonable but expected-by-construction. In his 2015 FQXi essay Mathematics is Physics Matt argued that
(Matt notes as an aside that he’s arguing for precisely the opposite of Tegmark’s MUH.)
Why “scale-free network”?
As an aside, Matt’s theory of theory-building explains (so he claims) what mathematical intuition is about: “intuition for efficient knowledge structure, rather than intuition about an abstract mathematical world”.
So what? How does this view pay rent?
Matt further develops the argument that the structure of human knowledge being networked-not-hierarchical implies that the idea that there is a most fundamental discipline, or level of reality, is mistaken in Against Fundamentalism, another FQXi essay published in 2018.