I think the appropriate way to study that system is a two-dimensional distribution (bug made,bug detected)->cost
Using that (together with the frequency of the individual bins), it is possible to generate both graphs out of the same source.
I know that this is more work to do ;).
See this paper (PDF), page 54, for one study that did this type of analysis, the only such that I’m currently aware of.
If you’re interested, you might want to graph the numerical results found there: you’ll find that they totally fail to match up with the standard exponential curve.
And, again, it’s worth thinking about this for a moment to think about the processes that generate the data: you generally don’t know that a bug has been introduced at the moment it’s introduced (otherwise you’d fix it straightaway), so there is a lot of opportunity for measurement bias there.
Similarly you don’t always know how much a bug has cost, because there are many activities that make up the economic cost of defects: lost customers, support calls, figuring out the bug, figuring out the fix, changing the code, documenting things, training developers to avoid the bug in future… Which of these you coun’t and don’t count is rarely reported in the literature.
It’s not even clear that you can always tell unambiguously what counts as a “bug”. The language in the industry is woefully imprecise.
Thanks for the link. The table shows another problem: Bugs introduced in different phases are different. How do you compare “1 bug” in the preliminary design with “1 bug” of the style “if(a=b)”?
Plotting a column as graph can be interesting (which was done in the original paper?), plotting a row looks nearly pointless.
And one thing that bugs me about these “studies” is precisely that they are all too vague as to exactly how (or even whether) they did this.
It’s difficult enough tracing faults back to the source code responsible for them: as every programmer knows, a fault may have multiple “sufficient causes”, several places in the code that you could change to fix the behavior. It’s not always the case that the line of code you change to fix the problematic behavior is the same line of code that introduced a defect. There is often a “bubble under the wallpaper” effect, a conservation of total bugginess, such that a “fix” in one place causes some other functional behavior to break.
It’s even more difficult to trace a defect in code back to a design decision or a design document: the causal pathways pass through one or more human brains (unless you’re using code generation from design documents, in which case the latter count as high-level “code” in my view). Human brains are notoriously non-auditable, they are both extremely tolerant of ambiguity and extremely vulnerable to it, a mixture which causes no end of frustration in software development.
The same arguments apply to “bugs in requirements”, in spades. What exactly does a bug in requirements consist of, and is it even the same beast as a bug in the code? Requirements document have no behavior—there is no way to “test” them, and in one sense requirements are always correct. How could they be incorrect—if we define “requirements” as “whatever the actual code is going to be judged against”? If we don’t define them that way, what is it that serves as a reference against which the requirements document will be judged correct or not, and why don’t we call that the requirements?
Any study of these phenomena needs to put forth what assumptions it operates under, and I have yet to see one empirical study which even attempted such a discussion.
You don’t know that a bug has been introduced when it was introduced, but once you fix it you can trace back to see when it was introduced.
You can often trace back to see when it was introduced prior to fixing it but after identifying the symptom. That’s actually a useful debugging tactic.
I think the appropriate way to study that system is a two-dimensional distribution (bug made,bug detected)->cost Using that (together with the frequency of the individual bins), it is possible to generate both graphs out of the same source. I know that this is more work to do ;).
See this paper (PDF), page 54, for one study that did this type of analysis, the only such that I’m currently aware of.
If you’re interested, you might want to graph the numerical results found there: you’ll find that they totally fail to match up with the standard exponential curve.
And, again, it’s worth thinking about this for a moment to think about the processes that generate the data: you generally don’t know that a bug has been introduced at the moment it’s introduced (otherwise you’d fix it straightaway), so there is a lot of opportunity for measurement bias there.
Similarly you don’t always know how much a bug has cost, because there are many activities that make up the economic cost of defects: lost customers, support calls, figuring out the bug, figuring out the fix, changing the code, documenting things, training developers to avoid the bug in future… Which of these you coun’t and don’t count is rarely reported in the literature.
It’s not even clear that you can always tell unambiguously what counts as a “bug”. The language in the industry is woefully imprecise.
Thanks for the link. The table shows another problem: Bugs introduced in different phases are different. How do you compare “1 bug” in the preliminary design with “1 bug” of the style “if(a=b)”? Plotting a column as graph can be interesting (which was done in the original paper?), plotting a row looks nearly pointless.
You don’t know that a bug has been introduced when it was introduced, but once you fix it you can trace back to see when it was introduced.
And one thing that bugs me about these “studies” is precisely that they are all too vague as to exactly how (or even whether) they did this.
It’s difficult enough tracing faults back to the source code responsible for them: as every programmer knows, a fault may have multiple “sufficient causes”, several places in the code that you could change to fix the behavior. It’s not always the case that the line of code you change to fix the problematic behavior is the same line of code that introduced a defect. There is often a “bubble under the wallpaper” effect, a conservation of total bugginess, such that a “fix” in one place causes some other functional behavior to break.
It’s even more difficult to trace a defect in code back to a design decision or a design document: the causal pathways pass through one or more human brains (unless you’re using code generation from design documents, in which case the latter count as high-level “code” in my view). Human brains are notoriously non-auditable, they are both extremely tolerant of ambiguity and extremely vulnerable to it, a mixture which causes no end of frustration in software development.
The same arguments apply to “bugs in requirements”, in spades. What exactly does a bug in requirements consist of, and is it even the same beast as a bug in the code? Requirements document have no behavior—there is no way to “test” them, and in one sense requirements are always correct. How could they be incorrect—if we define “requirements” as “whatever the actual code is going to be judged against”? If we don’t define them that way, what is it that serves as a reference against which the requirements document will be judged correct or not, and why don’t we call that the requirements?
Any study of these phenomena needs to put forth what assumptions it operates under, and I have yet to see one empirical study which even attempted such a discussion.
You can often trace back to see when it was introduced prior to fixing it but after identifying the symptom. That’s actually a useful debugging tactic.