Do you disagree that the presence in a small sample of two instances of very rare species constitutes strong prima facie evidence against the “coincidence” hypothesis?
I’ve already pointed out that under a reasonable interpretation of the imaginary data, the observed frequencies are literally the most likely outcome. Would your procedure make any sense if run on, say, lottery tickets?
I don’t know what you mean by the above, despite doing my best to understand. My intuition is that “the most likely outcome” is one in which our 9-project sample will contain no project in either of the “very rare” categories, or at best will have a project in one of them. (If you deal me nine poker hands, I do not expect to see three-of-a-kind in two of them.)
I didn’t understand your earlier example using chi-squared, which is what I take you to mean by “already pointed out”. You made up some data, and “proved” that chi-squared failed to reject the null when you asked it about the made-up data. You assumed a sample size of 100, when the implausibility of the coincidence hypothesis comes precisely from the much smaller sample size (plus the existence of “rare” categories and the overall number of categories).
a perfect example of motivated reasoning
I’m experiencing it as the opposite—I already have plenty of reasons to conclude that the 1995 data set doesn’t exist, I’m trying to give it the maximum benefit of doubt by assuming that it does exist and evaluating its fit with the 1979 data purely on probabilistic merits.
(ETA: what I’m saying is, forget the simulation, on which I’m willing to cop to charges of “intellectual masturbation”. Instead, focus on the basic intuition. If I’m wrong about that, then I’m wrong enough that I’m looking forward to having learned something important.)
(ETA2: the fine print on the chi-square test reads “for the chi-square approximation to be valid, the expected frequency should be at least 5”—so in this case the test may not apply.)
Do you disagree that the presence in a small sample of two instances of very rare species constitutes strong prima facie evidence against the “coincidence” hypothesis?
Why is coincidence a live hypothesis here? Surely we might expect there to be some connection—the numbers are ostensibly about the same government in the same country in different time periods. Another example of what I mean by you are making a ton of assumptions and you have not defined what parameters or distributions or sets of models you are working with. This is simply not a well-defined problem so far.
I didn’t understand your earlier example using chi-squared, which is what I take you to mean by “already pointed out”. You made up some data, and “proved” that chi-squared failed to reject the null when you asked it about the made-up data. You assumed a sample size of 100, when the implausibility of the coincidence hypothesis comes precisely from the much smaller sample size (plus the existence of “rare” categories and the overall number of categories).
And as I mentioned, I could do no other because the percentages simply cannot work as frequencies appropriate for any discrete tests with a specific sample of 9. I had to inflate to a sample size of 100 so I could interpret something like 2% as meaning anything at all.
What I mean by “coincidence” is “the 1979 data was obtained by picking at random from the same kind of population as the 1995 data, and the close fit of numbers results from nothing more sinister than a honest sampling procedure”.
You still haven’t answered a direct question I’ve asked three times—I wish you would shit or get off the pot.
(ETA: the 1979 document actually says that the selection wasn’t random: “We identified and analyzed nine cases where software development was contracted for with Federal funds. Some were brought to our attention because they were problem cases.”—so that sample would have been biased toward projects turned “bad”. But this is one of the complications I’m choosing to ignore, because it weighs on the side where my priors already lie—that the 1995 frequencies can’t possibly match the 1979 that closely without the latter being a textual copy of the earlier. I’m trying to be careful that all the assumptions I make, when I find I have to make them, work against the conclusion I suspect is true.)
What I mean by “coincidence” is “the 1979 data was obtained by picking at random from the same kind of population as the 1995 data,
What population is that?
You still haven’t answered a direct question I’ve asked three times—I wish you would shit or get off the pot.
You are not asking meaningful questions, you are not setting up your assumptions clearly. You are asking me, directly, “Is bleen more furfle than blaz, if we assume that quux>baz with a standard deviation of approximately quark and also I haven’t mentioned other assumptions I have made?” Well, I can answer that quite easily: I have no fucking idea, but good luck finding an answer.
While we are complaining about not answering, you have not answered my questions about coin flipping or about lotteries.
you have not answered my questions about coin flipping or about lotteries.
(You didn’t ask a question about coin flipping. The one about lotteries I answered: “I don’t know what you mean”. Just tying up any loose ends that might be interpreted as logical rudeness.)
Answered already—if the 1995 data set exists, then it pretty much has to be a survey of the entire spend of the US Department of Defense on software projects; a census, if you will. (Whether that is plausible or not is a separate question.)
You are not asking meaningful questions
Okay, let me try another one then. Suppose we entered this one into PredictionBook: “At some point before 2020, someone will turn up evidence such as a full-text paper, indicating that the 1995 Jarzombek data set exists, was collected independently of the 1979 GAO data set, and independently found the same frequencies.”
What probability would you assign to that statement?
I’m not trying to set up any assumptions, I’m just trying to assess how plausible the claim is that the 1995 data set genuinely exists, as opposed to its being a memetic copy of the 1979 study. (Independently even of whether this was fraud, plagiarism, a honest mistake, or whatever.)
What probability would you assign to that statement?
Very low. You’re the only one that cares, and government archives are vast. I’ve failed to find versions of many papers and citations I’d like to have in the past.
Do you disagree that the presence in a small sample of two instances of very rare species constitutes strong prima facie evidence against the “coincidence” hypothesis?
I don’t know what you mean by the above, despite doing my best to understand. My intuition is that “the most likely outcome” is one in which our 9-project sample will contain no project in either of the “very rare” categories, or at best will have a project in one of them. (If you deal me nine poker hands, I do not expect to see three-of-a-kind in two of them.)
I didn’t understand your earlier example using chi-squared, which is what I take you to mean by “already pointed out”. You made up some data, and “proved” that chi-squared failed to reject the null when you asked it about the made-up data. You assumed a sample size of 100, when the implausibility of the coincidence hypothesis comes precisely from the much smaller sample size (plus the existence of “rare” categories and the overall number of categories).
I’m experiencing it as the opposite—I already have plenty of reasons to conclude that the 1995 data set doesn’t exist, I’m trying to give it the maximum benefit of doubt by assuming that it does exist and evaluating its fit with the 1979 data purely on probabilistic merits.
(ETA: what I’m saying is, forget the simulation, on which I’m willing to cop to charges of “intellectual masturbation”. Instead, focus on the basic intuition. If I’m wrong about that, then I’m wrong enough that I’m looking forward to having learned something important.)
(ETA2: the fine print on the chi-square test reads “for the chi-square approximation to be valid, the expected frequency should be at least 5”—so in this case the test may not apply.)
Why is coincidence a live hypothesis here? Surely we might expect there to be some connection—the numbers are ostensibly about the same government in the same country in different time periods. Another example of what I mean by you are making a ton of assumptions and you have not defined what parameters or distributions or sets of models you are working with. This is simply not a well-defined problem so far.
And as I mentioned, I could do no other because the percentages simply cannot work as frequencies appropriate for any discrete tests with a specific sample of 9. I had to inflate to a sample size of 100 so I could interpret something like 2% as meaning anything at all.
What I mean by “coincidence” is “the 1979 data was obtained by picking at random from the same kind of population as the 1995 data, and the close fit of numbers results from nothing more sinister than a honest sampling procedure”.
You still haven’t answered a direct question I’ve asked three times—I wish you would shit or get off the pot.
(ETA: the 1979 document actually says that the selection wasn’t random: “We identified and analyzed nine cases where software development was contracted for with Federal funds. Some were brought to our attention because they were problem cases.”—so that sample would have been biased toward projects turned “bad”. But this is one of the complications I’m choosing to ignore, because it weighs on the side where my priors already lie—that the 1995 frequencies can’t possibly match the 1979 that closely without the latter being a textual copy of the earlier. I’m trying to be careful that all the assumptions I make, when I find I have to make them, work against the conclusion I suspect is true.)
What population is that?
You are not asking meaningful questions, you are not setting up your assumptions clearly. You are asking me, directly, “Is bleen more furfle than blaz, if we assume that quux>baz with a standard deviation of approximately quark and also I haven’t mentioned other assumptions I have made?” Well, I can answer that quite easily: I have no fucking idea, but good luck finding an answer.
While we are complaining about not answering, you have not answered my questions about coin flipping or about lotteries.
(You didn’t ask a question about coin flipping. The one about lotteries I answered: “I don’t know what you mean”. Just tying up any loose ends that might be interpreted as logical rudeness.)
Answered already—if the 1995 data set exists, then it pretty much has to be a survey of the entire spend of the US Department of Defense on software projects; a census, if you will. (Whether that is plausible or not is a separate question.)
Okay, let me try another one then. Suppose we entered this one into PredictionBook: “At some point before 2020, someone will turn up evidence such as a full-text paper, indicating that the 1995 Jarzombek data set exists, was collected independently of the 1979 GAO data set, and independently found the same frequencies.”
What probability would you assign to that statement?
I’m not trying to set up any assumptions, I’m just trying to assess how plausible the claim is that the 1995 data set genuinely exists, as opposed to its being a memetic copy of the 1979 study. (Independently even of whether this was fraud, plagiarism, a honest mistake, or whatever.)
Very low. You’re the only one that cares, and government archives are vast. I’ve failed to find versions of many papers and citations I’d like to have in the past.