Throughout my academic/research experiences in the social sciences and economic forecasting, it’s become clear that more complex models, whether it’s more variables, dynamics, or nonlinearity, rarely ever perform well. For the vast majority of situations in forecasting, it’s incredibly hard to beat a random-walk or an auto-regression (order 1).
There is no proof or explanation of why in an academic textbook, you just pick it up over time. Notable exceptions define entire subfields. The U.S. Term structure of debt is best modeled by using a set of ODEs to fit the cross-section, and stochastic dynamics to fit the time-series. The complexity there can grow enormously, and leads to lots of dense financial-math research, which actually does improve predictive accuracy in forecasting (still not by much, but it does consistently).
We actually see the same thing in economic analysis using words. While it’s often shakier than economists would like, describing monopolistic dynamics in an essay seems to be a nice approximation of reality in terms of predictive performance. I know this isn’t new to the LW crowd, but I always think of words as simply the painting of reality with non-linear dynamics in the way the human brain evolved to process information. That’s why neural networks, which learn these dynamics, work best for processing language (I think).
It turns out that words, like non-linear equations, are great at fitting data. If you find a subset of reality where you truly can use non-linear models, words, or both, to classify what’s going on, you’re in a great spot for predictive accuracy. Empirically though, in the collective experience of my field, that’s really hard to do. If your model diverges radically from a reduced form, random walk, or basic model, you need to be able to prove it wins.
Unfortunately, our brains do not seem to be good at detecting overfitting. The way I think about it, which like all evolutionary reasoning is questionable, is we evolved to learn nonlinear dynamics as we navigate our world, hunt, form relationships, and live in tribes. The complexity of a self-driving car is only a small subset of how we perceive reality. So, to us, it feels natural to use these words to paint nonlinear stories of reality, of the holy ghost, of Marxist theory, and all these advanced, nonsensical ideas.
Our thoughts suck because we overfit. If someone showed you a regression they fit, where they added a hundred transformations of the series of interest (squared, logged, cubed, etc), and their R2 was equal to 1, you’d tell them they are misguided.What’s the problems pace of Marx? “I fit a series of nonlinear dynamics, using words, to centuries of human interaction, and will use it to forecast human interaction forever.” Well, actually, you can do that. And it could be true. But it also might be garbage—nonsense.
I like your framing that thoughts represents attempts to “fit” the nonlinear dynamics of reality. This might actually be a more clarifying phrasing than the more general term “mapping” that I commonly see used. It makes the failure modes more obvious to imagine the brain as a highly intertwined group of neural networks attempting to find some highly compressive, very high R2 “fit” to the data of the world.
“Classification” is a task we canonically use neural networks for, and it’s not surprising that classification is both fundamental to human thought and potentially highly pathological. Perusing Stove’s list of 40 wrong statements, through the lense of “if this statement were the output of an artificial neural network, what would the neural network be doing wrong?”, I feel like a lot of them are indeed classification errors.
“Three” is a label that is activated by classification circuitry. The neural classification circuitry abstracts “three-ness” from the datastream as a useful compression. I myself have trained a neural network to accurately count the number of balls in a video stream; that neural network has a concept of three-ness. Unlike that particular neural network, humans then introspect on three-ness and get confused about what it is. We get further confused because “three-ness” has other innate properties in the context of mathematics, unlike, say, “duck-ness”. We feel like it must be explained beyond just being a useful compression filter. “Three is a real object.” “There is no real number three.” “Three is an essence.” “There is an ideal three which transcends actual triples of objects.” Almost any of the statements of the form “Three is … ” fall into this trap of being overinterpretations of a classification scheme.
Throughout my academic/research experiences in the social sciences and economic forecasting, it’s become clear that more complex models, whether it’s more variables, dynamics, or nonlinearity, rarely ever perform well. For the vast majority of situations in forecasting, it’s incredibly hard to beat a random-walk or an auto-regression (order 1).
There is no proof or explanation of why in an academic textbook, you just pick it up over time. Notable exceptions define entire subfields. The U.S. Term structure of debt is best modeled by using a set of ODEs to fit the cross-section, and stochastic dynamics to fit the time-series. The complexity there can grow enormously, and leads to lots of dense financial-math research, which actually does improve predictive accuracy in forecasting (still not by much, but it does consistently).
We actually see the same thing in economic analysis using words. While it’s often shakier than economists would like, describing monopolistic dynamics in an essay seems to be a nice approximation of reality in terms of predictive performance. I know this isn’t new to the LW crowd, but I always think of words as simply the painting of reality with non-linear dynamics in the way the human brain evolved to process information. That’s why neural networks, which learn these dynamics, work best for processing language (I think).
It turns out that words, like non-linear equations, are great at fitting data. If you find a subset of reality where you truly can use non-linear models, words, or both, to classify what’s going on, you’re in a great spot for predictive accuracy. Empirically though, in the collective experience of my field, that’s really hard to do. If your model diverges radically from a reduced form, random walk, or basic model, you need to be able to prove it wins.
Unfortunately, our brains do not seem to be good at detecting overfitting. The way I think about it, which like all evolutionary reasoning is questionable, is we evolved to learn nonlinear dynamics as we navigate our world, hunt, form relationships, and live in tribes. The complexity of a self-driving car is only a small subset of how we perceive reality. So, to us, it feels natural to use these words to paint nonlinear stories of reality, of the holy ghost, of Marxist theory, and all these advanced, nonsensical ideas.
Our thoughts suck because we overfit. If someone showed you a regression they fit, where they added a hundred transformations of the series of interest (squared, logged, cubed, etc), and their R2 was equal to 1, you’d tell them they are misguided.What’s the problems pace of Marx? “I fit a series of nonlinear dynamics, using words, to centuries of human interaction, and will use it to forecast human interaction forever.” Well, actually, you can do that. And it could be true. But it also might be garbage—nonsense.
Thanks for sharing the essay.
I like your framing that thoughts represents attempts to “fit” the nonlinear dynamics of reality. This might actually be a more clarifying phrasing than the more general term “mapping” that I commonly see used. It makes the failure modes more obvious to imagine the brain as a highly intertwined group of neural networks attempting to find some highly compressive, very high R2 “fit” to the data of the world.
“Classification” is a task we canonically use neural networks for, and it’s not surprising that classification is both fundamental to human thought and potentially highly pathological. Perusing Stove’s list of 40 wrong statements, through the lense of “if this statement were the output of an artificial neural network, what would the neural network be doing wrong?”, I feel like a lot of them are indeed classification errors.
“Three” is a label that is activated by classification circuitry. The neural classification circuitry abstracts “three-ness” from the datastream as a useful compression. I myself have trained a neural network to accurately count the number of balls in a video stream; that neural network has a concept of three-ness. Unlike that particular neural network, humans then introspect on three-ness and get confused about what it is. We get further confused because “three-ness” has other innate properties in the context of mathematics, unlike, say, “duck-ness”. We feel like it must be explained beyond just being a useful compression filter. “Three is a real object.” “There is no real number three.” “Three is an essence.” “There is an ideal three which transcends actual triples of objects.” Almost any of the statements of the form “Three is … ” fall into this trap of being overinterpretations of a classification scheme.
All of the above can probably be corrected by consistently replacing the symbol with the substance and tabooing words, rather than playing syntactic games with symbols.