It’s not just averaging, it’s the problem of making valid inferences in general; reasoning from observations to generalized conclusions.
In fact, the data in my original post on the cost of fixing defects wasn’t even much of an “average” to start with—that is, it wasn’t really obtained by sampling a population, measuring some variable of interest, and generalizing from the expected value of that variable in the sample to the expected value of that variable in the population.
The “sample” wasn’t really a sample but various samples examined at various times, of varying sizes. The “measure” wasn’t really a single measure (“cost to fix”) but a mix of several operationalizations, some looking at engineers’ reports on timesheets, others looking at stopwatch measurements in experimental settings, others looking at dollar costs from accounting data, and so on. The “variable” isn’t really a variable—there isn’t widespread agreement on what counts as the cost of fixing a defect, as the thread illustrated in a few places. And so on. So it’s no wonder that the conclusions are not credible—“averaging” as an operation has little to do with why.
I have a further post on software engineering mostly written—I’ve been sitting on it for a few weeks now because I haven’t found the time to finalize the diagrams—which shows that a lot of writing on software engineering has suffered from egregious mistakes in reasoning about causality.
It’s not just averaging, it’s the problem of making valid inferences in general; reasoning from observations to generalized conclusions.
In fact, the data in my original post on the cost of fixing defects wasn’t even much of an “average” to start with—that is, it wasn’t really obtained by sampling a population, measuring some variable of interest, and generalizing from the expected value of that variable in the sample to the expected value of that variable in the population.
The “sample” wasn’t really a sample but various samples examined at various times, of varying sizes. The “measure” wasn’t really a single measure (“cost to fix”) but a mix of several operationalizations, some looking at engineers’ reports on timesheets, others looking at stopwatch measurements in experimental settings, others looking at dollar costs from accounting data, and so on. The “variable” isn’t really a variable—there isn’t widespread agreement on what counts as the cost of fixing a defect, as the thread illustrated in a few places. And so on. So it’s no wonder that the conclusions are not credible—“averaging” as an operation has little to do with why.
I have a further post on software engineering mostly written—I’ve been sitting on it for a few weeks now because I haven’t found the time to finalize the diagrams—which shows that a lot of writing on software engineering has suffered from egregious mistakes in reasoning about causality.