I’m not sure I understand why they’re against point estimates. As long as the points match the mean of our estimates for the variables, then the points multiplied should match the expected value of the distribution.
Because people draw incorrect conclusions from the point estimates. You can have high expected value of the distribution (e.g. “millions of civilizations”) while at the same time having big part of the probability mass on outcomes with just one civilization, of few civilizations far away.
I think the real point here (as I’ve commented elsewhere) isn’t that using point estimates is inherently a mistake, it’s that the expected value is not what we care about. They’re valid for that, but not for the thing we actually care about, which is P(N=0).
What do you mean by “real point”? Don’t you mean that the point of the paper is that someone makes a particular mistake?
I mean the mistake of computing expected number rather than probability. I guess the people in the 60s, like Drake and Sagan probably qualify. They computed an expected number of planets, because that’s what they were interested in, but were confused because they mixed it up with probability. But after Hart (1975) emphasizes the possibility that there is no life out there, people ask the right question. Most of them say things like “Maybe I was wrong about the probability of life.” That’s not the same as doing a full bayesian update, but surely it counts as not making this mistake.
It’s true that Patrick asserts this mistake. And maybe the people making vague statements of the form “maybe I was wrong” are confused, but not confused enough to make qualitatively wrong inferences.
Huh, interesting. I have to admit I’m not really familiar with the literature on this; I just inferred this from the use of point estimates. So you’re saying people recognized that the quantity to focus on was P(N>0) but used point estimates anyway? I guess what I’m saying is, if you ask “why would they do that”, I would imagine the answer to be, “because they were still thinking of the Drake equation, even though it was developed for a different purpose”. But I guess that’s not necessarily so; it could just have been out of mathematical convenience...
Definitely mathematical convenience. In many contexts people do sensitivity analysis instead of bayesian updates. It is good to phrase things as bayesian updates, if only as a different point of view, but when that is the better thing to do (which in this case I do not believe), trumpeting it as right and the other method as wrong is the worst kind of mathematical triumphalism that has destroyed modern science.
Not quite. Expected value is linear but doesn’t commute with multiplication. Since the Drake equation is pure multiplication then you could use point estimates of the means in log space and sum those to get the mean in log space of the result, but even then you’d *only* have the mean of the result, whereas what would really be a “paradox” is if P(N=1|N≠0) turned out to be tiny.
You don’t need any correlation between X and Y to have E[XY]≠E[X]E[Y] . Suppose both variables are 1 with probability .5 and 2 with probability .5; then their mean is 1.5, but the mean of their products is 2.25.
Indeed, each has a mean of 1.5; so the product of their means is 2.25, which equals the mean of their product. We do in fact have E[XY]=E[X]E[Y] in this case. More generally we have this iff X and Y are uncorrelated, because, well, that’s just how “uncorrelated” in the technical sense is defined. I mean if you really want to get into fundamentals, E[XY]-E[X]E[Y] is not really the most fundamental definition of covariance, I’d say, but it’s easily seen to be equivalent. And then of course either way you have to show that independent implies uncorrelated. (And then I guess you have to do the analogues for more than two, but...)
I’m not sure I understand why they’re against point estimates. As long as the points match the mean of our estimates for the variables, then the points multiplied should match the expected value of the distribution.
Because people draw incorrect conclusions from the point estimates. You can have high expected value of the distribution (e.g. “millions of civilizations”) while at the same time having big part of the probability mass on outcomes with just one civilization, of few civilizations far away.
I think the real point here (as I’ve commented elsewhere) isn’t that using point estimates is inherently a mistake, it’s that the expected value is not what we care about. They’re valid for that, but not for the thing we actually care about, which is P(N=0).
I’m skeptical that anyone ever made that mistake. Can you point to an example?
The paper doesn’t claim anyone did, does it?
Made what mistake, exactly?
What do you mean by “real point”? Don’t you mean that the point of the paper is that someone makes a particular mistake?
I mean the mistake of computing expected number rather than probability. I guess the people in the 60s, like Drake and Sagan probably qualify. They computed an expected number of planets, because that’s what they were interested in, but were confused because they mixed it up with probability. But after Hart (1975) emphasizes the possibility that there is no life out there, people ask the right question. Most of them say things like “Maybe I was wrong about the probability of life.” That’s not the same as doing a full bayesian update, but surely it counts as not making this mistake.
It’s true that Patrick asserts this mistake. And maybe the people making vague statements of the form “maybe I was wrong” are confused, but not confused enough to make qualitatively wrong inferences.
Huh, interesting. I have to admit I’m not really familiar with the literature on this; I just inferred this from the use of point estimates. So you’re saying people recognized that the quantity to focus on was P(N>0) but used point estimates anyway? I guess what I’m saying is, if you ask “why would they do that”, I would imagine the answer to be, “because they were still thinking of the Drake equation, even though it was developed for a different purpose”. But I guess that’s not necessarily so; it could just have been out of mathematical convenience...
Definitely mathematical convenience. In many contexts people do sensitivity analysis instead of bayesian updates. It is good to phrase things as bayesian updates, if only as a different point of view, but when that is the better thing to do (which in this case I do not believe), trumpeting it as right and the other method as wrong is the worst kind of mathematical triumphalism that has destroyed modern science.
Not quite. Expected value is linear but doesn’t commute with multiplication. Since the Drake equation is pure multiplication then you could use point estimates of the means in log space and sum those to get the mean in log space of the result, but even then you’d *only* have the mean of the result, whereas what would really be a “paradox” is if P(N=1|N≠0) turned out to be tiny.
The authors grant Drake’s assumption that everything is uncorrelated, though.
You don’t need any correlation between X and Y to have E[XY]≠E[X]E[Y] . Suppose both variables are 1 with probability .5 and 2 with probability .5; then their mean is 1.5, but the mean of their products is 2.25.
Indeed, each has a mean of 1.5; so the product of their means is 2.25, which equals the mean of their product. We do in fact have E[XY]=E[X]E[Y] in this case. More generally we have this iff X and Y are uncorrelated, because, well, that’s just how “uncorrelated” in the technical sense is defined. I mean if you really want to get into fundamentals, E[XY]-E[X]E[Y] is not really the most fundamental definition of covariance, I’d say, but it’s easily seen to be equivalent. And then of course either way you have to show that independent implies uncorrelated. (And then I guess you have to do the analogues for more than two, but...)
Gah, of course you’re correct. I can’t imagine how I got so confused but thank you for the correction.