Recently, an excited friend was telling me the story behind why we care about the mean, median and mode.
They explained that a straightforward idea for what you might want in an ‘average’ number, is something that minimises how far it is from all the other numbers in the dataset—so if your numbers are 1, 2 and 3, you want a number x such that the sum of the distance to each datapoint is as small as possible. It turns out this number is 2.
However, if your numbers are 1, 2, and 4, the number that minimises the distance from all of them is also 2.
Huh?
When my friend told me this, the two other people I was with sort of said “Okay”. I said “What? No! I don’t believe you! It has to change when the data does—it’s a linear sum, so it has to change! It’s like you’re saying the sum of 1, 2 and 3 is the same as the sum of 1, 2 and 4. This is just wrong.” Suffice to say, my friend’s claim wasn’t predicted by my understanding of math.
Now, did I really not believe my friend? The other two people with us were certainly fine with it. Isn’t this just bayesianism? That’s how the old joke goes:
Math teacher: Now I’m going to prove to you that X is true.
Bayesian: You just did.
Actually, no. You taught me a detail to memorise, but my models didn’t improve. I won’t be able to improve how I use averages, because I don’t understand how it fits in with everything else I understand—it doesn’t fit with the models I use everywhere else in math.
I mean, I could’ve nodded along. It’s only one fact, after all. But if I’m going to remember it in the long term, it should connect to my other models and be reinforced. The alternative is to be stored in the brain with all those other memorised facts that students learn for exams and forget immediately after.
If you’re trying to build new models of a domain, it’s important to choose to speak from the confusion, not from the rest of yourself. Don’t have conversations about whether you believe a thing. Instead talk about whether you understand it.
(The problem above was the definition of the median, and an explanation of the math for the curious can be found inthis comment.)
II.
It can be really hard to feel your models. Qiaochu Yuan’s method of learning involves ramping feeling-his-models up to 11. I recall him telling me about trying to learn what fire was once, where his first step was to just really feel his confusion:
What the hell is this orange stuff? How on earth does it get here? Why is it flickering? WHAT IS FIRE?!
After feeling the confusion, Qiaochu holds onto his frustration (which he finds easier to hold), and tries throwing ideas and possible explanations at it until all the parts finally fit together—that feeling when you say “Ohhhhhhh” and the models finally compute, and your beliefs predict the experience you have. Be frustrated with reality.
Tim Urban (of WaitButWhy) tells a similar story, where he can only write essays about things he doesn’tcurrently understand—and as he’s digging through all the facts and pieces things together, he writes down the things that made sense to him, that would successful get the models across to an earlier version of Tim Urban.
I used to think this made no sense and he must just be bad at introspecting—shouldn’t you have to build an excellent model of other people to write so compellingly for so many tens of thousands of them?
Yet it’s actually really rare for authors to be strongly connected to their own models—when a teacher explains something for the hundredth time, they likely can’t remember what it was like to learn it for the first. And so Tim’s explanations can be clearer than most.
In the opening example where I was surprised by the definition of the median, if you had offered me a bet I would’ve bet on the side that this was the definition of a median. But it was not a useful thought for me in that moment, to set aside my confusion and say “On reflection I believe you”. It can be correct in conversation, when your goal is understanding, to hold onto the confusion, the frustration, and let your models do the speaking.
III.
I often feel people try to move a conversation toward whether I believe the claim, rather than discussing and sharing what we each understand.
“Do you believe me when I say picking an average by minimising the distance to all the points is the same as the median?
“Hmm, can you tell me why that’s the case? I have a model of arithmetic that says it shouldn’t be…”
A phrase I often use: “You may have changed my betting odds but you haven’t changed my models!”
We’re all in the game of trying to build models. Whether you’re trying to understand the field of science you’re attempting to add knowledge to, the product your startup is building, or the architecture of the AGI you’re trying to align, you need good models to leverage reality for whatever you care about.
One of the most important skills in life is the ability to hold onto your confusion and let your models do the talking, so they can interface with reality more directly. Choosing to notice and hold on to your confusion is hard, and it’s so easy to lose sight of it.
To put it another way, here are some perfectly acceptable noises to make when your goal is understanding:
What? No! I don’t believe you! That can’t be true!
I expect that some but not all of this post is surprisingly Ben-specific. My thanks to Alex Zhu (zhukeepa) and Jacob Lagerros (jacobjacob) for reading drafts.
Hold On To The Curiosity
I.
Recently, an excited friend was telling me the story behind why we care about the mean, median and mode.
They explained that a straightforward idea for what you might want in an ‘average’ number, is something that minimises how far it is from all the other numbers in the dataset—so if your numbers are 1, 2 and 3, you want a number x such that the sum of the distance to each datapoint is as small as possible. It turns out this number is 2.
However, if your numbers are 1, 2, and 4, the number that minimises the distance from all of them is also 2.
Huh?
When my friend told me this, the two other people I was with sort of said “Okay”. I said “What? No! I don’t believe you! It has to change when the data does—it’s a linear sum, so it has to change! It’s like you’re saying the sum of 1, 2 and 3 is the same as the sum of 1, 2 and 4. This is just wrong.” Suffice to say, my friend’s claim wasn’t predicted by my understanding of math.
Now, did I really not believe my friend? The other two people with us were certainly fine with it. Isn’t this just bayesianism? That’s how the old joke goes:
Actually, no. You taught me a detail to memorise, but my models didn’t improve. I won’t be able to improve how I use averages, because I don’t understand how it fits in with everything else I understand—it doesn’t fit with the models I use everywhere else in math.
I mean, I could’ve nodded along. It’s only one fact, after all. But if I’m going to remember it in the long term, it should connect to my other models and be reinforced. The alternative is to be stored in the brain with all those other memorised facts that students learn for exams and forget immediately after.
If you’re trying to build new models of a domain, it’s important to choose to speak from the confusion, not from the rest of yourself. Don’t have conversations about whether you believe a thing. Instead talk about whether you understand it.
(The problem above was the definition of the median, and an explanation of the math for the curious can be found in this comment.)
II.
It can be really hard to feel your models. Qiaochu Yuan’s method of learning involves ramping feeling-his-models up to 11. I recall him telling me about trying to learn what fire was once, where his first step was to just really feel his confusion:
After feeling the confusion, Qiaochu holds onto his frustration (which he finds easier to hold), and tries throwing ideas and possible explanations at it until all the parts finally fit together—that feeling when you say “Ohhhhhhh” and the models finally compute, and your beliefs predict the experience you have. Be frustrated with reality.
Tim Urban (of WaitButWhy) tells a similar story, where he can only write essays about things he doesn’t currently understand—and as he’s digging through all the facts and pieces things together, he writes down the things that made sense to him, that would successful get the models across to an earlier version of Tim Urban.
I used to think this made no sense and he must just be bad at introspecting—shouldn’t you have to build an excellent model of other people to write so compellingly for so many tens of thousands of them?
Yet it’s actually really rare for authors to be strongly connected to their own models—when a teacher explains something for the hundredth time, they likely can’t remember what it was like to learn it for the first. And so Tim’s explanations can be clearer than most.
In the opening example where I was surprised by the definition of the median, if you had offered me a bet I would’ve bet on the side that this was the definition of a median. But it was not a useful thought for me in that moment, to set aside my confusion and say “On reflection I believe you”. It can be correct in conversation, when your goal is understanding, to hold onto the confusion, the frustration, and let your models do the speaking.
III.
I often feel people try to move a conversation toward whether I believe the claim, rather than discussing and sharing what we each understand.
A phrase I often use: “You may have changed my betting odds but you haven’t changed my models!”
We’re all in the game of trying to build models. Whether you’re trying to understand the field of science you’re attempting to add knowledge to, the product your startup is building, or the architecture of the AGI you’re trying to align, you need good models to leverage reality for whatever you care about.
One of the most important skills in life is the ability to hold onto your confusion and let your models do the talking, so they can interface with reality more directly. Choosing to notice and hold on to your confusion is hard, and it’s so easy to lose sight of it.
To put it another way, here are some perfectly acceptable noises to make when your goal is understanding:
I expect that some but not all of this post is surprisingly Ben-specific. My thanks to Alex Zhu (zhukeepa) and Jacob Lagerros (jacobjacob) for reading drafts.