Learn the mathematical structure, not the conceptual structure
I’ve recently been learning about transformers and noticed a failure mode of my learning that has occurred throughout my life: trying to learn a subject from material that deals with the high-level conceptual structure of something instead of learning the mathematical structure more directly. I do not mean to suggest that one needs to focus on hardcore formalizations for everything, but there is a difference between learning the conceptual structure of a subject, and learning the conceptual structure of the mathematical framework of a subject.
The most salient example to me of this phenomenon occurred when I was trying to teach myself quantum mechanics at the end of high school. I voraciously read many popular accounts of QM, watched interviews with physicists, etc. These sources would emphasize the wave-particle duality, Schrodinger’s cat, the double-slit experiment, and the uncertainty principle. I could certainly recite these concepts back in conversation, but at no point did I feel like I understood quantum mechanics.
That is, until I read the Wikipedia entry on the mathematical formalism of quantum mechanics (or some similar type of reference, I don’t remember exactly). There I found an explanation not of the physics of QM, but instead of the mathematical structure of QM. What I learned was that QM is a game with rules. The rules are that the state of the system is given as an arrow, and that the dynamics of the arrow are given by a pretty straightforward linear differential equation, and that “measurements” were associated with linear operators (matrices), and the rules of measurement were that the state of the system would “collapse” to an eigenvector of the operator with probabilities given by dot products of the current state with the eigenvectors.
This was mind-blowing. All that time I took reading about Schrodinger’s cat I could have instead simply learned that everything comes from a vector moving according to a linear diffy-Q plus some straightforward rules about eigenvectors and linear operators.
I am no mathematician; I want to stress that I don’t mean that one should focus on highly-formalized mathematics when dealing with any subject, but that often when I find myself struggling to understand something, or when I find myself having the same conversations over and over again, it pays to try to focus on finding an explanation, even an abstract conceptual explanation, not of the subject, but instead of the mathematical structure.
I think one often sees this failure mode in action in the types of subjects that lend themselves to abstracted, metaphysical, and widely applicable thinking. Some examples include predictive coding and category theory.
For example with predictive coding and active inference. It feels often that there is an enormous amount of back and forth discussion on topics like these, at an abstracted conceptual level, when instead the discussion could be made much more concrete by talking about the actual mathematical structure of these things. I get the sense (I am very much guilty of this) that many people talk about these subjects without putting ample effort into really understanding the structure underlying these ideas. What ends up happening is that subjects are overly applied to many different situations, and a lot of wheel spinning happens with no useful work being created.
Of course, this lesson can be overly applied, and there is much to be said for being able to explore ideas without caring too much about formalism and mathematics—but often when I am stuck and I feel like I haven’t really grokked something despite putting in effort, it helps to remember this failure mode exists, and to seek out a different sort of explanation.
Thanks for pointing this out. A good spike in my programming skills has happened when I started directly reading the code instead of the documentation. The situations seems very similar to the one described in the post.
The general lesson is that understanding the thing directly is better than understanding someone else’s explanation of the thing.
Yes—but from the post’s author perspective, it’s not super nice to put in one sentence what he took eight paragraphs to express. So you should think about that as well...
The original post has much more value than the one-sentence summary, but having a one-sentence explanation of the commonality between the mathematical example and the programming example can be useful.
I would say it is perhaps not nice to provide that sort of summary but it is kind.
I thought it was a great way to put it and I appreciated it a lot! I’m not even sure the post has more value than the summary; at the very least that one sentence adds a lot of explanatory power imho.
This looks like “lies to kids”, but from the point of view of an adult realizing they have been lied to.
And “lies to kids”, that is pretty much how everything is taught, you can’t just go “U(1)...”, you start out with “light...”, and then maybe eventually when you told enough lies, you can say “ok that was all a lie, here it how it is” and then tell more lies. Do that for long enough and you hit ground truth.[1]
So what do you do?
Balance your lies when you teach others, maybe even say things like “ok, so this is not exactly true, but for now you will have to accept it, and eventually we can go deeper”.
And the other way around, if you read something or someone teaches you something, you should be cognizant that this is unlikely the true nature of whatever you read / are taught.
A) Be careful when you use your knowledge to synthesis ideas / solutions / insights.
B) Be curious, go down rabbit holes, get as much ground “truth” as possible.
That’s the compressed version of what I do.
Not really, unless we are talking about mathematics.
Related: https://xkcd.com/435/
Yes, this. A lot of people talking about AI Alignment and similar topics never touched or even read a line of code that was implementing part of ML system. Yes, it follows the usual “don’t burn the timeline” mantra, but it also means that a lot of what they talk about doesn’t make any sense, because they don’t know what they are talking about. And created as a result “white noise” isn’t good neither for AI nor for AI Alignment research.
I wonder if some (a lot?) of the people on this forum do not suffer from what I would call a sausage maker problem. Being too close to the actual, practical design and engineering of these systems, knowing too much about the way they are made, they cannot fully appreciate their potential for humanlike characteristics, including consciousness, independent volition etc., just like the sausage maker cannot fully appreciate the indisputable deliciousness of sausages, or the lawmaker the inherent righteousness of the law. I even thought of doing a post like that—just to see how many downvotes it would get…
Well—at least I followed the guidelines and made a prediction, regarding downvotes. That my model of the world works regarding this forum has therefore been established, certainly and without a doubt.
Also—I personally think there is something intellectually lazy about downvoting without bothering to express in a sentence or two the nature of the disagreement—but that’s admitedly more of a personal appreciation.
(So my prediction here is: if I were to engage one of these no-justification downvoters in an ad rem debate, I would find him or her to be intellectually lacking. Not sure if it’s a testable hypothesis, in practice, but it sure would be interesting if it were.)
I find the common downvoting-instead-of-arguing mentality frustrating and immature. If I don’t have the energy for a counterargument, I simply don’t react at all. Just doing downvotes is intellectually worthless booing. As feedback it’s worse than useless.
Strong upvote!
I think many people’s default philosophical assumption (mine, certainly) is that mathematics are a discourse about the truth, a way to describe it, but they are not, fundamentally, the truth. Thus, in the vulgarisation efforts of professional quantum physicists (those who care to vulgarize), it is relatively common to find the admission that while they understand the maths of it well enough (I mean… hopefully, being professionals) they couldn’t say with any confidence that they understood the truth of it, that they understood, at an intimate level, the nature of what is going on. And I don’t think it’s simply playing cute or false modesty (although of course there will always be a bit of that, also) either. Now of course you could say, which would solve many problems, that there is no such thing as the “truth of it”, no “nature of what is going on”, that the mathematical formalism is really the alpha and omega, the totality of the knoweable and the meaningful as it relates to it. That position can certainly be argued with some semblance of reason, but it does feel like a defeat for the human mind.
Math referes to both a formalised language and a formalised mode of thought that are continuous with common language and mode of thought. What else could there be to learn about the truth of the matter for humans? Or even for other hypothetical minds (with their analogous ‘math’)? It seems like reifying the idea of “truth” to something that you don’t even know what it looks like or even if it’s a coherent or real idea, and you have very good reasons to think it’s not.
Math is what we use to create the best mental models of reality (any mental model will be formalised in the way of something we can reasonably call ‘math’), there’s nothing to comprehend outside of our models.
Yeah… as they say: there’s often a big gap between smart and wise.
Smart people are usually good at math. Which means they have a strong emotional incentive to believe that math can explain everything.
Wise people are aware of the emotional incentives that fashion their beliefs, and they know to distrust them.
Ideally—one would be both: smart and wise.
I’m using ‘math’ here to mean the mode of thought, not the representation of mathematical objects or the act of doing calculations.
But what is there to comprehend other than math? math is not a special way of thinking limited by ‘made-up’ mathy concepts, it’s just our thinking formalised. You can have a better or worse intuition about the meaning of mathematical objects, but the intuition is math.
Sure, math is limited, but the limitations of it are our limitations. There are no limitations inherent to math.
I mean: I just look at the world as it is, right, without preconceived notions, and it seems relatively evident to me that no: it cannot be fully explained and understood through math. Please describe to me, in mathematical terms, the differences between Spanish and Italian culture? Please explain to me, in mathematical terms, the role and function of sheriffs in medieval England. I could go on and on and on…
I mean, I was always referring to the point that you presented in your first comment. Your first comment was explicitly about how physicists are not “playing cute” when they say they don’t know if they understand “the truth of it, the nature of what is going on”. My point was that there’s nothing to understand outside of the math because there’s nothing to understand outside of your model of reality (which is math). And understanding the math is understanding what the math means (how reality appears to work to us) not just how to manipulate the mathematical objects.
About what you are saying now, how do you distinguish what is math or what isn’t for this:
Philosophically : no. When you look at the planet Jupiter you don’t say : “Hum, oh: - there’s nothing to understand about this physical object beyond math, because my model of it, which is sufficient for a full understanding of its reality, is mathematic.” Or mabye you do—but then I think our differences might too deep to bridge. If you don’t—why don’t you with Jupiter, but would with an electron or a photon?
mmm, but the deepest intuition about the reasons behind the phenomenological properties of Jupiter (like its retrograde movement in the sky, or its colors) comes from intuition about the extrinsic meaning and intrinsic properties of mathematical models about Jupiter. How else?
Sure, it’s the perspective of observers, not reality in-and-of-itself, but that’s a fundamental limitation of any observer (regardless if they use math or don’t), and the model can be epistemically wrong, but that’s not the point (that’s not exclusively a property of math).
Just to be clear, I’ve always been speaking epistemically not ontologically.
What is to be understood outside your model of reality is reality. The model is an attempt to understand it.
Yeah but your intuition of how reality works is within the model that you’ve built from empirical results.
..in an attempt to understand reality.
Of course, but ‘intuition’ and ‘understanding’ are within the map, not within the territory.
Which still doesn’t show “there’s nothing to understand outside of your model of reality”
Yes it does, what you understand (the intuition about how things work) is your model of the world, the model of the world is a representation of reality made based on empirical tests. You don’t have epistemic access to reality itself. That’s what I mean by saying that “understanding is within the map, not within the territory”.
What do you understand about reality outside of your model of reality?
...except via
Except that empirical tests don’t allow you to have direct epistemic access to ‘reality itself’ (direct knowledge about reality), because you need to interpret those tests to derive the models (the knowledge), and it’s not always determined wh. ‘Direct epistemic access to reality’ is the idealisation of knowing what reality would be like independently of any model (of course, not anything that is actually possible).
‘Understanding’ is a cognitive process and only exists within your cognition, the way you think your model represents reality is still part of the model itself, it doesn’t mean it’s wrong (although it predictably could be, at least it definitely is a simplification), it still explains what you observe after all, which is the thing you are directly acquainted with. The underlying reality can be completely unlike what it feels like from the inside, imagine that the substrate of ‘a’ reality is the transistors that simulate it or a holograph or ‘physical’ things, and someone inside it can’t tell which it is because the underlying physics can be any of those.
Yes, but “direct” is the crux… you didn’t mention it before.
I said ‘reality itself’ though, as in ‘reality independently of any observer’. Whatever it’s fine, it’s an understandable confusion between us.
Have you read Kant or are you reinventing him?
No I haven’t, but this is basic stuff(? Like, this type of skepticism has existed since the Antiquity in some schools of thought, and this is no different from the Cartesian doubt.
Descartes of course summon God as a solution to the problem. He chickened out >->
Forms of scepticism based on the directness of perception as revealed by the scientific world view are modern, in the sense of only going back a few centuries.
Ok, I thought that some Buddhist schools and the Pyrrhonists had taught that type of skepticism, but it doesn’t seem to be so.
My girlfriend has entire topics where she refuses to engage with articles or papers without mathematical equations, for essentially this reason. Basically that without those, you cannot truly get it. And the more I am getting into the nitty-gritty of consciousness, the more I am inclined to agree that this degree of precision is necessary to make applicable progress beyond a certain point.
This insight can be reversed: If you can’t understand the mathematical details of a theory (which will be true for many of us, math is often hard), don’t waste undue time on understanding the high-level features. Luckily, many interesting theories outside physics have much simpler math than quantum mechanics.