You can try to find people who produce such an experiment as a side-effect, but in that case you don’t get to specify parameters (that may lead to a failure to control some variable—or not).
Overall cost of experiment for all involved parties will be not too low, though (although marginal cost of the experiment relative to just doing business as usual can be reduced, probably).
A “high-cost bug” seems to imply tens of hours spent overall on fixing. Otherwise, it is not clear how to measure the cost—from my experience quite similar bugs can take from 5 minutes to a couple of hours to locate and fix without clear signs of either case. Exploration depends on your shape, after all. On the other hand, it should be a relatively small part of the entire project, otherwise it seems to be not a bug, but the entire project goal (this skews data about both locating the bug and cost of integrating the fix).
if 10-20 hours (how could you predict how high-cost will a bug be?) are a small part of a project, you are talking about at least hundreds of man-hours (it is not a good measure of project complexity, but it is an estimate of cost). Now, you need to repeat, you need to try alternative strategies to get more data on early detection and on late detection and so on.
It can be that you have access to some resource that you can spend on this (I dunno, a hundred students with a few hours per week for a year dedicated to some programming practice where you have a relative freedom?) but not on anything better; it may be that you can influence set of measurements of some real projects.. But the experiment will only be cheap by making someone else cover the main cost (probably, for a good unrelated reason).
Also notice that if you cannot influence how things are done, only how they are measured, you need to specify what is measured much better than the cited papers do. What is the moment of introduction of a bug? What is cost of fixing a bug? Note that fixing a high-cost bug may include doing some improvements that were put off before. This putting off could be a decision with a reason, or just irrational. It would be nice if someone proposed a methodology of measuring enough control variables in such a project—but not because it would let us run this experiment, but because it would be a very useful piece of research on software project costs in general.
A “high-cost bug” seems to imply tens of hours spent overall on fixing. Otherwise, it is not clear how to measure the cost—from my experience quite similar bugs can take from 5 minutes to a couple of hours to locate and fix without clear signs of either case.
A high-cost bug can also be one that reduces the benefit of having the program by a large amount.
For instance, suppose the “program” is a profitable web service that makes $200/hour of revenue when it is up, and costs $100/hour to operate (in hosting fees, ISP fees, sysadmin time, etc.), thus turning a tidy profit of $100/hour. When the service is down, it still costs $100/hour but makes no revenue.
Bug A is a crashing bug that causes data corruption that takes time to recover; it strikes once, and causes the service to be down for 24 hours, which time is spent fixing it. This has the revenue impact of $200 · 24 = $4800.
Bug B is a small algorithmic inefficiency; fixing it takes an eight-hour code audit, and causes the operational cost of the service to come down from $100/hour to $99/hour. This has the revenue impact of $1 · 24 · 365 = $8760/year.
Bug C is a user interface design flaw that makes the service unusable to the 5% of the population who are colorblind. It takes five minutes of CSS editing to fix. Colorblind people spend as much money as everyone else, if they can; so fixing it increases the service’s revenue by 4.8% to $209.50/hour. This has the revenue impact of $9.50 · 24 · 365 = $83,220/year.
The definition of cost you use (damage-if-unfixed-by-release) is distinct from all the previous definitions of cost (cost-to-fix-when-found). Neither is easy to measure. Actual cited articles discuss the latter definition.
I asked to include the original description of the values plotted in the article, but this it not there yet.
Of course, existence of the high-cost bug in your definition implies that the project is not just a cheap experiment.
Futhermore, following your example makes the claim the article contests as plausible story without facts behind it the matter of simple arithmetics (the longer the bug lives, the higher is time multiplier of its value). On the other hand, given that many bugs become irrelevant because of some upgrade/rewrite before they are found, it is even harder to estimate the number of bugs, let alone cost of each one. Also, how an inefficiency affects operating costs can be difficult enough to estimate that nobody knows whether it is better to fix a cost-increaser or add a new feature to increase revenue.
I asked to include the original description of the values plotted in the article, but this it not there yet.
Is that a request addressed to me? :)
If so, all I can say is that what is being measured is very rarely operationalized in the cited articles: for instance, the Brady 1999 “paper” isn’t really a paper in the usual sense, it’s a PowerPoint, with absolutely no accompanying text. The Brady 1989 article I quote even states that these costs weren’t accurately measured.
The older literature, such as Boehm’s 1976 article “Software Engineering”, does talk about cost to fix, not total cost of the consequences. He doesn’t say what he means by “fixing”. Other papers mention “development cost required to detect and resolve a software bug” or “cost of reworking errors in programs”—those point more strongly to excluding the economic consequences other than programmer labor.
Of course. My point is that you focused a bit too much on misciting instead of going for quick kill and saying that they measure something underspecified.
Also, if you think that their main transgression is citing things wrong, exact labels from the graphs you show seem to be a natural thing to include. I don’t expect you to tell us what they measured—I expect you to quote them precisely on that.
The main issue is that people just aren’t paying attention. My focus on citation stems from observing that a pair of parentheses, a name and a year seem to function, for a large number of people in my field, as a powerful narcotic suspending their critical reason.
I expect you to quote them precisely on that.
If this is a tu quoque argument, it is spectacularly mis-aimed.
as a powerful narcotic suspending their critical reason.
The distinction I made is about the level of suspension. It looks like people suspend their reasoning about statements having a well-defined meaning, not just reasoning about the mere truth of facts presented. I find the former way worse than the latter.
I expect you to quote them precisely on that.
If this is a tu quoque argument, it is spectacularly mis-aimed.
It is not about you, sorry for stating it slightly wrong. I thought about unfortunate implications but found no good way to evade them. I needed to contrast “copy” and “explain”.
I had no intention to say you were being hypocritical, but discussion started to depend on some highly relevant (from my point of view) objectively short piece of data that you had but did not include. I actually was wrong about one of my assumptions about original labels...
As to your other question: I suspect that the first author to mis-cite Grady was Karl Wiegers in his requirements book (from 2003 or 2004), he’s also the author of the Serena paper listed above. A very nice person, by the way—he kindly sent me an electronic copy of the Grady presentation. At least he’s read it. I’m pretty damn sure that secondary citations afterwards are from people who haven’t.
Well, if he has read the Grady paper and cited it wrong, most likely that he has got his nice graph from somewhere… I wonder who and why published this graph for the first time.
About references—well, what discipline is not diseased like that? We are talking about something that people (rightly or wrongly) equate with common sense in the field. People want to cite some widely accepted statement, which agrees with their perceived experience. And the deadline is nigh. If they find an article with such a result, they are happy. If they find a couple of articles referencing this result, they steal the citation. After all, who cares what to cite, everybody knows this, right?
I am not sure that even in maths the situation is significantly better. There are fresher results where you understand how to find a paper to reference, there are older results that can be found in university textbooks, and there is middle ground where you either find something that looks like a good enough reference or have to include a sketch if the proof. (I have done the latter for some relatively simple result in a maths article).
By definition, no cheap experiment can give meaningful data about high-cost bugs.
That sounds intuitively appealing, but I’m not quite convinced that it actually follows.
You can try to find people who produce such an experiment as a side-effect, but in that case you don’t get to specify parameters (that may lead to a failure to control some variable—or not).
Overall cost of experiment for all involved parties will be not too low, though (although marginal cost of the experiment relative to just doing business as usual can be reduced, probably).
A “high-cost bug” seems to imply tens of hours spent overall on fixing. Otherwise, it is not clear how to measure the cost—from my experience quite similar bugs can take from 5 minutes to a couple of hours to locate and fix without clear signs of either case. Exploration depends on your shape, after all. On the other hand, it should be a relatively small part of the entire project, otherwise it seems to be not a bug, but the entire project goal (this skews data about both locating the bug and cost of integrating the fix).
if 10-20 hours (how could you predict how high-cost will a bug be?) are a small part of a project, you are talking about at least hundreds of man-hours (it is not a good measure of project complexity, but it is an estimate of cost). Now, you need to repeat, you need to try alternative strategies to get more data on early detection and on late detection and so on.
It can be that you have access to some resource that you can spend on this (I dunno, a hundred students with a few hours per week for a year dedicated to some programming practice where you have a relative freedom?) but not on anything better; it may be that you can influence set of measurements of some real projects.. But the experiment will only be cheap by making someone else cover the main cost (probably, for a good unrelated reason).
Also notice that if you cannot influence how things are done, only how they are measured, you need to specify what is measured much better than the cited papers do. What is the moment of introduction of a bug? What is cost of fixing a bug? Note that fixing a high-cost bug may include doing some improvements that were put off before. This putting off could be a decision with a reason, or just irrational. It would be nice if someone proposed a methodology of measuring enough control variables in such a project—but not because it would let us run this experiment, but because it would be a very useful piece of research on software project costs in general.
A high-cost bug can also be one that reduces the benefit of having the program by a large amount.
For instance, suppose the “program” is a profitable web service that makes $200/hour of revenue when it is up, and costs $100/hour to operate (in hosting fees, ISP fees, sysadmin time, etc.), thus turning a tidy profit of $100/hour. When the service is down, it still costs $100/hour but makes no revenue.
Bug A is a crashing bug that causes data corruption that takes time to recover; it strikes once, and causes the service to be down for 24 hours, which time is spent fixing it. This has the revenue impact of $200 · 24 = $4800.
Bug B is a small algorithmic inefficiency; fixing it takes an eight-hour code audit, and causes the operational cost of the service to come down from $100/hour to $99/hour. This has the revenue impact of $1 · 24 · 365 = $8760/year.
Bug C is a user interface design flaw that makes the service unusable to the 5% of the population who are colorblind. It takes five minutes of CSS editing to fix. Colorblind people spend as much money as everyone else, if they can; so fixing it increases the service’s revenue by 4.8% to $209.50/hour. This has the revenue impact of $9.50 · 24 · 365 = $83,220/year.
Which bug is the highest-cost? Seems clear to me.
The definition of cost you use (damage-if-unfixed-by-release) is distinct from all the previous definitions of cost (cost-to-fix-when-found). Neither is easy to measure. Actual cited articles discuss the latter definition.
I asked to include the original description of the values plotted in the article, but this it not there yet.
Of course, existence of the high-cost bug in your definition implies that the project is not just a cheap experiment.
Futhermore, following your example makes the claim the article contests as plausible story without facts behind it the matter of simple arithmetics (the longer the bug lives, the higher is time multiplier of its value). On the other hand, given that many bugs become irrelevant because of some upgrade/rewrite before they are found, it is even harder to estimate the number of bugs, let alone cost of each one. Also, how an inefficiency affects operating costs can be difficult enough to estimate that nobody knows whether it is better to fix a cost-increaser or add a new feature to increase revenue.
Is that a request addressed to me? :)
If so, all I can say is that what is being measured is very rarely operationalized in the cited articles: for instance, the Brady 1999 “paper” isn’t really a paper in the usual sense, it’s a PowerPoint, with absolutely no accompanying text. The Brady 1989 article I quote even states that these costs weren’t accurately measured.
The older literature, such as Boehm’s 1976 article “Software Engineering”, does talk about cost to fix, not total cost of the consequences. He doesn’t say what he means by “fixing”. Other papers mention “development cost required to detect and resolve a software bug” or “cost of reworking errors in programs”—those point more strongly to excluding the economic consequences other than programmer labor.
Of course. My point is that you focused a bit too much on misciting instead of going for quick kill and saying that they measure something underspecified.
Also, if you think that their main transgression is citing things wrong, exact labels from the graphs you show seem to be a natural thing to include. I don’t expect you to tell us what they measured—I expect you to quote them precisely on that.
The main issue is that people just aren’t paying attention. My focus on citation stems from observing that a pair of parentheses, a name and a year seem to function, for a large number of people in my field, as a powerful narcotic suspending their critical reason.
If this is a tu quoque argument, it is spectacularly mis-aimed.
The distinction I made is about the level of suspension. It looks like people suspend their reasoning about statements having a well-defined meaning, not just reasoning about the mere truth of facts presented. I find the former way worse than the latter.
It is not about you, sorry for stating it slightly wrong. I thought about unfortunate implications but found no good way to evade them. I needed to contrast “copy” and “explain”.
I had no intention to say you were being hypocritical, but discussion started to depend on some highly relevant (from my point of view) objectively short piece of data that you had but did not include. I actually was wrong about one of my assumptions about original labels...
No offence taken.
As to your other question: I suspect that the first author to mis-cite Grady was Karl Wiegers in his requirements book (from 2003 or 2004), he’s also the author of the Serena paper listed above. A very nice person, by the way—he kindly sent me an electronic copy of the Grady presentation. At least he’s read it. I’m pretty damn sure that secondary citations afterwards are from people who haven’t.
Well, if he has read the Grady paper and cited it wrong, most likely that he has got his nice graph from somewhere… I wonder who and why published this graph for the first time.
About references—well, what discipline is not diseased like that? We are talking about something that people (rightly or wrongly) equate with common sense in the field. People want to cite some widely accepted statement, which agrees with their perceived experience. And the deadline is nigh. If they find an article with such a result, they are happy. If they find a couple of articles referencing this result, they steal the citation. After all, who cares what to cite, everybody knows this, right?
I am not sure that even in maths the situation is significantly better. There are fresher results where you understand how to find a paper to reference, there are older results that can be found in university textbooks, and there is middle ground where you either find something that looks like a good enough reference or have to include a sketch if the proof. (I have done the latter for some relatively simple result in a maths article).
Or to put that another way, there can’t be any low-hanging fruit, otherwise someone would have plucked it already.