As I noted in my other comment, he redefined the terms underdog/overdog to be based on poteriors, not priors, effectively rendering them redundant (and useless as a heuristic).
Most of the time, priors and posteriors match. If you expect the posterior to differ from your prior in a specific direction, then change your prior.
And thus, you should expect 99% of underdogs to lose and 99% of overdogs to win. If all you know is that a dog won, you should be 99% confident the dog was an overdog. If the standard narrative reports the underdog winning, that doesn’t make the narrative impossible, but puts a burden of implausibility on it.
And thus, you should expect 99% of underdogs to lose and 99% of overdogs to win. If all you know is that a dog won, you should be 99% confident the dog was an overdog.
Second statement assumes that the base rate of underdogs and overdogs is the same. In practice I would expect there to be far more underdogs than overdogs.
Good point. I was thinking of underdog and overdog as relative, binary terms—in any contest, one of two dogs is the underdog, and the other is the overdog. If that’s not the case, we can expect to see underdogs beating other underdogs, for instance, or an overdog being up against ten underdogs and losing to one of them.
They don’t even have to be fat-tailed; in very simple examples you can know that on the next observation, your posterior will either be greater or lesser but not the same.
Here’s an example: flipping a biased coin in a beta distribution with a uniform prior, and trying to infer the bias/frequency. Obviously, when I flip the coin, I will either get a heads or a tails, so I know after my first flip, my posterior will either favor heads or tails, but not remain unchanged! There is no landing-on-its-edge intermediate 0.5 coin. Indeed, I know in advance I will be able to rule out 1 of 2 hypotheses: 100% heads and 100% tails.
But this isn’t just true of the first observation. Suppose I flip twice, and get heads then tails; so the single most likely frequency is 1⁄2 since that’s what I have to date. But now we’re back to the same situation as in the beginning: we’ve managed to accumulative evidence against the most extreme biases like 99% heads, so we have learned something from the 2 flips, but we’re back in the same situation where we expect the posterior to differ from the prior in 2 specific directions but cannot update the prior: the next flip I will either get 2⁄3 or 1⁄3 heads. Hence, I can tell you—even before flipping—that 1⁄2must be dethroned in favor of 1⁄3 or 2/3!
And yet if you add those two posterior distributions, weighted by your current probability of ending up with each, you get your prior back. Magic!
(Witch burners don’t get their prior back when they do this because they expect to update in the direction of “she’s a witch” in either case, so when they sum over probable posteriors, they get back their real prior which says “I already know that she’s a witch”, the implication being “the trial has low value of information, let’s just burn her now”.)
Max liklihood tells you which is most likely, which is mostly meaningless without further assumptions. For example, if you wanted to bet on what the next flip would be, a max liklihood method won’t give you the right probability.
OTOH, the expected value of the beta distribution with parameters a and b happens to equal the mode of the beta distribution with parameters a − 1 and b − 1, so maximum likelihood does give the right answer (i.e. the expected value of the posterior) if you start from the improper prior B(0, 0).
(IIRC, the same thing happens with other types of distributions, if you pick the ‘right’ improper prior (i.e. the one Jaynes argues for in conditions of total ignorance for totally unrelated reasons) for each. I wonder if this has some particular relevance.)
I suppose this is a hilariously obvious thing to say, but I wonder how much leftism Marcion Mugwump has actually read. We’re completely honest about the whole power-seizing thing. It’s not some secret truth.
(Okay, some non-Marxist traditions like anarchism have that whole “people vs. power” thing. But they’re confused.)
“Civil disobedience” is no more than a way for the overdog to say to the underdog: I am so strong that you cannot enforce your “laws” upon me.
This statement is obviously true. But it sure would be useful to have a theory that predicted (or even explained) when a putative civil disobedience would and wouldn’t work that way.
Obviously, willing to use overwhelming violence usually defeats civil disobedience. But not every protest wins, and it is worth trying to figure out why—if for no other reason than figuring out if we could win if we protested something.
I see no way to interpret it that would make it true. Civil disobedience serves to provoke a response that will—alone among crises that we know about—decrease people’s attitudes of obedience or submission to “traditional” authority. In the obvious Just-So Story, leaders who will use violence against people who pose no threat might also kill you.
We would expect this Gandhi trick to fail if the authorities get little-to-none of their power from the attitude in question. The nature of their response must matter as well. (Meanwhile, as you imply, I don’t know how Moldbug wants us to detect strength. My first guess would be that he wants his chosen ‘enemies’ to appear strong so that he can play underdog.)
I don’t think we are disagreeing on substance. “Underdog” and similar labels are narrative labels, not predictive labels. I interpreted Moldbug as saying that treating narrative labels as predictive labels is likely to lead one to make mistaken predictions and / or engage in hindsight bias. This is a true statement, but not a particularly useful one—it’s a good first step, but not a complete analysis.
Thus, the extent to which Moldbug treats the statement as complete analysis is error.
I hadn’t read your comment before I posted this. I assumed it meant what the terms usually mean, and lacked moldbuggerian semantics. In that sense, it would be a warning against rooting for the (usual) underdog, which is certainly a bias I’ve found myself wandering into in the past.
In retrospect I was somewhat silly for assuming Moldbug would use a word to mean what it actually means.
I have read his comment and the article. Knowing Moldbug’s style I agree with GLaDOS on the interpretation. I may be wrong in which case interpret the quote in line with my interpretation rather than original meaning.
--Mencius Moldbug, here
I can’t overemphasise how much I agree with this quote as a heuristic.
As I noted in my other comment, he redefined the terms underdog/overdog to be based on poteriors, not priors, effectively rendering them redundant (and useless as a heuristic).
I consider this an uncharitable reading, I’ve read the article twice and I still understood him much as Konkvistador and Athrelon have.
Most of the time, priors and posteriors match. If you expect the posterior to differ from your prior in a specific direction, then change your prior.
And thus, you should expect 99% of underdogs to lose and 99% of overdogs to win. If all you know is that a dog won, you should be 99% confident the dog was an overdog. If the standard narrative reports the underdog winning, that doesn’t make the narrative impossible, but puts a burden of implausibility on it.
Second statement assumes that the base rate of underdogs and overdogs is the same. In practice I would expect there to be far more underdogs than overdogs.
Good point. I was thinking of underdog and overdog as relative, binary terms—in any contest, one of two dogs is the underdog, and the other is the overdog. If that’s not the case, we can expect to see underdogs beating other underdogs, for instance, or an overdog being up against ten underdogs and losing to one of them.
How should I change my prior if I expect it to change in the specific directions either up or down—but not the same?
Fat tailed distributions make the rockin’ world go round.
They don’t even have to be fat-tailed; in very simple examples you can know that on the next observation, your posterior will either be greater or lesser but not the same.
Here’s an example: flipping a biased coin in a beta distribution with a uniform prior, and trying to infer the bias/frequency. Obviously, when I flip the coin, I will either get a heads or a tails, so I know after my first flip, my posterior will either favor heads or tails, but not remain unchanged! There is no landing-on-its-edge intermediate 0.5 coin. Indeed, I know in advance I will be able to rule out 1 of 2 hypotheses: 100% heads and 100% tails.
But this isn’t just true of the first observation. Suppose I flip twice, and get heads then tails; so the single most likely frequency is 1⁄2 since that’s what I have to date. But now we’re back to the same situation as in the beginning: we’ve managed to accumulative evidence against the most extreme biases like 99% heads, so we have learned something from the 2 flips, but we’re back in the same situation where we expect the posterior to differ from the prior in 2 specific directions but cannot update the prior: the next flip I will either get 2⁄3 or 1⁄3 heads. Hence, I can tell you—even before flipping—that 1⁄2 must be dethroned in favor of 1⁄3 or 2/3!
And yet if you add those two posterior distributions, weighted by your current probability of ending up with each, you get your prior back. Magic!
(Witch burners don’t get their prior back when they do this because they expect to update in the direction of “she’s a witch” in either case, so when they sum over probable posteriors, they get back their real prior which says “I already know that she’s a witch”, the implication being “the trial has low value of information, let’s just burn her now”.)
Yup, sure does. Which is a step toward the right idea Kindly was gesturing at.
For coin bias estimate, as for most other things, the self-consistent updating procedure follows maximum likelihood.
Max liklihood tells you which is most likely, which is mostly meaningless without further assumptions. For example, if you wanted to bet on what the next flip would be, a max liklihood method won’t give you the right probability.
Yes.
OTOH, the expected value of the beta distribution with parameters a and b happens to equal the mode of the beta distribution with parameters a − 1 and b − 1, so maximum likelihood does give the right answer (i.e. the expected value of the posterior) if you start from the improper prior B(0, 0).
(IIRC, the same thing happens with other types of distributions, if you pick the ‘right’ improper prior (i.e. the one Jaynes argues for in conditions of total ignorance for totally unrelated reasons) for each. I wonder if this has some particular relevance.)
I suppose this is a hilariously obvious thing to say, but I wonder how much leftism Marcion Mugwump has actually read. We’re completely honest about the whole power-seizing thing. It’s not some secret truth.
(Okay, some non-Marxist traditions like anarchism have that whole “people vs. power” thing. But they’re confused.)
Ehm… what?
Yes but as a friend reminded me recently, saying obvious things can be necessary.
The heuristic is great, but that article is horrible, even for Moldbug.
I agree. For example:
This statement is obviously true. But it sure would be useful to have a theory that predicted (or even explained) when a putative civil disobedience would and wouldn’t work that way.
Obviously, willing to use overwhelming violence usually defeats civil disobedience. But not every protest wins, and it is worth trying to figure out why—if for no other reason than figuring out if we could win if we protested something.
I see no way to interpret it that would make it true. Civil disobedience serves to provoke a response that will—alone among crises that we know about—decrease people’s attitudes of obedience or submission to “traditional” authority. In the obvious Just-So Story, leaders who will use violence against people who pose no threat might also kill you.
We would expect this Gandhi trick to fail if the authorities get little-to-none of their power from the attitude in question. The nature of their response must matter as well. (Meanwhile, as you imply, I don’t know how Moldbug wants us to detect strength. My first guess would be that he wants his chosen ‘enemies’ to appear strong so that he can play underdog.)
I don’t think we are disagreeing on substance. “Underdog” and similar labels are narrative labels, not predictive labels. I interpreted Moldbug as saying that treating narrative labels as predictive labels is likely to lead one to make mistaken predictions and / or engage in hindsight bias. This is a true statement, but not a particularly useful one—it’s a good first step, but not a complete analysis.
Thus, the extent to which Moldbug treats the statement as complete analysis is error.
How is it great? How would you use this “heuristic”?
I hadn’t read your comment before I posted this. I assumed it meant what the terms usually mean, and lacked moldbuggerian semantics. In that sense, it would be a warning against rooting for the (usual) underdog, which is certainly a bias I’ve found myself wandering into in the past.
In retrospect I was somewhat silly for assuming Moldbug would use a word to mean what it actually means.
I have read his comment and the article. Knowing Moldbug’s style I agree with GLaDOS on the interpretation. I may be wrong in which case interpret the quote in line with my interpretation rather than original meaning.