Interesting read to begin with. Nice anology. I do support the thought that claims made (in any field) should have data to back it up.
I do think at this point that , even though there is no ‘hard scientific data’ to claim it; Don’t we have enough experience to know that once software is in operation, when bugs are found they cost more the fix than initially?
(Bugs are also in my opinion features that do not meet the expectations)
Even though the chart may be taken out of context, and a bit taken too far I don’t think it belongs to the infamous quotes like “you only use 10% of your brain”. This claim btw is easier to “prove” wrong. You could measure brain activity and calculate the amount of % is used of the whole. Software however is much more complex.
It is much harder to prove if defects actually cost more to fix later than to fix early. I don’t think the bugs themselves actually are more costly. Sure, some bugs will be more costly because of the increased complexity (compared to the not-yet-released-version), but most costs will come from the missed oppertunities. A concrete example would be an e-commerce website only supporting Visa Cards, while the customer expected it to have Visa Cards, but also Mastercard, and other creditcard vendor support. Clearly the website will miss income, the costs of this ‘defect’ will be much greater of this missed oppertunity than actually implementing the support. (yes, you need to back this up with numbers, but you get the point :)).
Kudos for pointing out this ‘flaw’, it takes some balls to do so ;)
ISTM that you’re making a great argument that the defects claim is in the same category as the “10% of the brain” category. Let me explain.
To a layman, not well versed in neuroanatomy, the 10% thing has surface plausibility because of association between brain size and intelligence (smaller brained animals are dumber, in general), and because of the observed fact that some humans are massively smarter than others (e.g. Einstein, the paradigmatic case). Therefore, someone with the same size brain who’s only “normal” in IQ compared to Einstein must not be using all of that grey matter.
Of course, as soon as you learn more of what we actually know about how the brain works, for instance the results on modularity, the way simulated neural networks perform their functions, and so on—then the claim loses its plausibility, as you start asking which 90% we’re supposed not to be using, and so on.
Similarly, someone with a poor understanding of “defects” assumes that they are essentially physical in nature: they are like a crack in cement, and software seems like layer upon layer of cement, so that if you need to reach back to repair a crack after it’s been laid over, that’s obviously harder to fix.
But software defects are nothing like defects in physical materials. The layers of which software is built are all equally accessible, and software doesn’t crack or wear out. The problem is a lot more like writing a novel in which a heroine is dark-haired, complete with lots of subtle allusions or maybe puns referencing that hair color, and then deciding that she is blonde after all.
As you observe, the cost of fixing a defect is not a single category, but in fact decomposes in many costs which have fuzzy boundaries:
the cost of observing the erroneous behaviour in the first place (i.e. testing, whether a tester does it or a user)
the cost of locating the mistake in the code
the cost of devising an appropriate modification
the cost of changing the rest of the software to reflect the modification
the economic consequences of having released the defect to the field
the economic consequences of needing a new release
all other costs (I’m sure I’m missing some)
These costs are going to vary greatly according to the particular context. The cost of testing depends on the type of testing, and each type of testing catches different types of bugs. The cost of releasing new versions is very high for embedded software, very low for Web sites. The cost of poor quality is generally low in things like games, because nobody’s going to ask for their money back if Lara Croft’s guns pass through a wall or two; but it can be very high in automated trading software (I’ve personally touched software that had cost its owners millions in bug-caused bad trades). Some huge security defects go undetected for a long time, causing zero damage until they are found (look up the 2008 Debian bug).
The one thing that we know (or strongly suspect) from experience is always monotonically increasing as we add more code is “the cost of changing the rest of the software to reflect the modification”. This increase applies whatever the change being made, which is why the “cost of change curve” is plausible. (The funny part of the story is that there never was a “cost of change curve”, it’s all a misunderstanding; the ebook tells the whole story.)
Of course, someone who is a) sophisticated enough to understand the decomposition and b) educated enough to have read about the claim is likely to be a programmer, which means that by the availability heuristic they’re likely to think that the cost they know best is what dominates the entire economic impact of defects.
In fact, this is very unlikely to be the case in general.
And in fact, the one case where I have seen a somewhat credible study with detailed data (the Hughes Aircraft study), the data went counter to the standard exponential curve: it was expensive to fix a defect during the coding phase, but the (average per defect) cost then went down.
But software defects are nothing like defects in physical materials. The layers of which software is built are all equally accessible
I don’t think this is quite true. For instance, a few years ago, I traced a bug in my application down to an issue in how the Java Virtual Machine does JIT compiling, which caused subtle differences in a numerical algorithm between when the application started up, and when it had warmed up enough that certain functions were JIT compiled. Almost certainly, the correct fix would have been to correct the JVM so that the results were exactly the same in all cases.
But, of course, the JVM was nowhere near as “accessible” as the library containing the bug—almost everyone relies on a prebuilt version of the JVM, and it is rather difficult to build. Also, it’s written in a different and less friendly language: C++. Of course, this assumes that we are using a free/open source JVM (as it happens, we are); the situation would be even worse if we had to rely on a proprietary VM. And it assumes that all of our users would have been willing to use a custom JVM until a fixed version of the mainline JVM were released.
Another possibility would have been to add a compile-time option to the library containing that algorithm, so that that particular function would either always be JIT-compiled, or would never be. That’s pretty straightforward—as it happens, a different division of my company employs some of that library’s authors. But the authors didn’t consider it a worthwhile thing to do. So now we could maintain a fork of the library forever, or we could fix the bug somewhere else. Again, of course, this relied on the library being open; with a typical proprietary library, there would have been no recourse.
Needless to say, the top layer, the application, was the easiest thing to change, and so that’s what changed.
Neither lower-level would have negatively impacted other library users (well, maybe turning off JIT on this function might have, but turning it always-on wouldn’t). So I do think there is, in some sense, a difference in accessibility between layers which is not just caused by the interdependence problem. We genuinely do treat lower layers as foundational, because it makes it easier to develop, distribute, and collaborate on software. So I’m not sure that a construction analogy is entirely inappropriate here.
So I’m not sure that a construction analogy is entirely inappropriate here.
Good observations—but note that these criteria for “accessible” (and the consequences you discuss) are socio-political in nature, rather than physical: the JVM is the result of someone else’s analysis, design, programming, testing etc. - and your decision to use it is not part of the software life-cycle depicted in the diagram.
A theory which attempted to account for such differences would find its ontology invaded with notions of copyright, software licensing, organizational divisions and the like—the SDLC would no longer be sufficient.
Some of them are socio-political, but I think others are intellectual. That is, I understand the internals of my program well, the library somewhat, and the JVM barely at all. And this would probably be close to accurate even if I had written the whole stack myself, since I would have written the JVM the longest time ago, the library more recently, and the program last. Buildings are built of stuff; programs are built of thoughts. That some information is more accessible because you have used it more recently is a fact about the brain rather than about software. But brains (and organizations) are all we have to build software with. So I think any methodology that does not account for these things must be incomplete.
Buildings are built of stuff; programs are built of thoughts. That some information is more accessible because you have used it more recently is a fact about the brain rather than about software.
There you have, in a nutshell, the problem with software engineering as a formal discipline: its stubborn refusal to admit the above, in the face of determined pushes to do so from the likes of Bill Curtis (who’s been refused a Wikipedia entry because he’s less notable than any number of porn stars) or Jerry Weinberg.
Dijkstra’s view was that the limitations of the human mind are precisely the reason that software must be treated as mathematics and developed with mathematical rigour.
That depends on where the mathematics is done. Dijkstra’s and Hoare’s vision of programmers proving their own code correct with pencil and paper is unworkable. People cannot reliably do any sort of formal manipulation on paper, not even something as simple as making an exact copy of a document. The method can be exhibited on paper for toy examples, but everything works for toy examples. So what to do?
Compare the method of writing machine code by developing a method of translating a higher-level language into machine code. This can be exhibited on paper for toy examples, but of course that is only for didactic purposes, and one writes a compiling program to actually do that work in production. This reduces the work of writing programs in machine code to the task of writing just one program in machine code, the compiler, and by bootstrapping techniques that can be reduced even further. The amount of mathematics carried out on the human side of the interface is greatly reduced.
Similarly, proving things about programs has to be done automatically, or there’s no point. We have to prove things about programs, because computing hardware and software is mathematics, whether we wish it to be or not. Software engineering is precisely the problem of how human beings, who cannot reliably do mathematics, can reliably instruct a mathematical machine to do what we want it to with mathematical reliability.
I don’t have any solutions, it just seems to me that this is how to describe the problem. How do we interface thought-stuff with machine-stuff?
In this case, it’s not clear that the compiler was really wrong. The results of a floating point calculation differed by a tiny amount, and it’s possible that either was acceptable (I don’t know how strict Java is about its floating point rules). The problem was that I was using the result as a hash key.
But later, I was able to make the JVM reliably dump core (in different memory locations each time). Unfortunately, it was only under extremely heavy load, and I was never able to build a reduced test case.
Compilers do get things wrong. You may be interested in John Regehr’s blog; he’s essentially throwing cleverly-chosen “random” input at C compilers (“fuzz-testing”). The results are similar to those for other programs that have never been fuzzed, i.e. embarrassing.
Well, your prior should be pretty high that it’s your fault, unless you also wrote the compiler :)
If you can do experiments to prove that there’s a compiler bug, you learn something. If you jump straight to the compiler bug explanation instead of looking for the bug in your own code, you are resisting education, and the probability that all you are doing is delaying the lesson is the probability that the compiler is working correctly. This should be >>50% of the time or you need a better compiler!
The difference here is not so much in where you guess the bug is, as in whether you do the experiment.
A very effective experiment is to take your program and chop out everything irrelevant until you have a short piece of code which demonstrates the bug. At this point, if it is a compiler bug, you have dense information to hand the compiler author; if it isn’t a compiler bug, you’re in a much better position to understand what’s wrong with your code.
However, one is often reluctant to apply this technique until one suspects a compiler bug, because it seems like a lot of work. And it is — but often less work than continuing to examine the bug with less radical tools, given that you’re in the position where the notion of compiler bugs crosses your mind.
The problem is a lot more like writing a novel in which a heroine is dark-haired, complete with lots of subtle allusions or maybe puns referencing that hair color, and then deciding that she is blonde after all.
This is the second time in this thread that the analogy of software design as fiction writing has appeared, and I really quite like it. If it’s not already popular, maybe it should be.
In my experience most user requirements documents are works of fantasy. It’s our job as programmers to drag the genre closer to science fiction. Software testing is more like gritty hard-boiled detective fiction.
You make me curious about your book, perhaps I’ll read it. Thanks for the extensive answer. Could’nt agree more with what you’re saying. I can see why this ‘cost of change curve’ actually might not exist at all.
Made me wonder, I recently found a graph by Sommerville telling the exact story about these cost of change. I wonder what its source is for that graph .. ;)
Googling a bit for stuff by Sommerville, I come across a pie chart for “distribution of maintenance effort” which has all the hallmarks of a software engineering meme: old study, derived from a survey (such self-reports are often unreliable owing to selection bias and measurement bias), but still held to be current and generally applicable and cited in many books even though more recent research casts doubt on it.
Here’s a neat quote from the linked paper (LST is the old study):
(Possibly) participants in the survey from which LST was derived simply did not have adequate data to respond to the survey. The participating software maintenance managers were asked whether their response to each question was based on reasonably accurate data, minimal data, or no data. In the case of the LST question, 49.3% stated that their answer was based on reasonably accurate data, 37.7% on minimal data, and 8.7% on no data. In fact, we seriously question whether any respondents had ‘‘reasonably accurate data’’ regarding the percentage of effort devoted to the categories of maintenance included in the survey, and most of them may not have had even ‘‘minimal data.’’
I love it that 10% of managers can provide a survey response based on “no data”. :)
I’ve read the paper you refer to, very interesting data indeed. The quote is one of five possible explenations of why the results differ so much, but it certainly is a good possibility.
This post sparked my interest/doubt knob for now. I will question more ‘facts’ in the SE world from now on.
I have based my findings on the presentations now, since I haven’t got the book nearby. You can look them up yourself (download the chapters from the above link).
Chapter 7 says:
Requirements error costs are high so validation is very important
• Fixing a requirements error after delivery may
cost up to 100 times the cost of fixing an
implementation error.
Chapter 21, refers to Software Maintanance, claiming (might need to verify this as well? ;)) :
[Maintanance costs are] Usually greater than development costs (2 to 100 depending on the application).
Because I don’t have the book nearby I cannot tell for certain where it was stated. But I was pretty certain it was stated in that book.
Interesting read to begin with. Nice anology. I do support the thought that claims made (in any field) should have data to back it up.
I do think at this point that , even though there is no ‘hard scientific data’ to claim it; Don’t we have enough experience to know that once software is in operation, when bugs are found they cost more the fix than initially?
(Bugs are also in my opinion features that do not meet the expectations)
Even though the chart may be taken out of context, and a bit taken too far I don’t think it belongs to the infamous quotes like “you only use 10% of your brain”. This claim btw is easier to “prove” wrong. You could measure brain activity and calculate the amount of % is used of the whole. Software however is much more complex.
It is much harder to prove if defects actually cost more to fix later than to fix early. I don’t think the bugs themselves actually are more costly. Sure, some bugs will be more costly because of the increased complexity (compared to the not-yet-released-version), but most costs will come from the missed oppertunities. A concrete example would be an e-commerce website only supporting Visa Cards, while the customer expected it to have Visa Cards, but also Mastercard, and other creditcard vendor support. Clearly the website will miss income, the costs of this ‘defect’ will be much greater of this missed oppertunity than actually implementing the support. (yes, you need to back this up with numbers, but you get the point :)).
Kudos for pointing out this ‘flaw’, it takes some balls to do so ;)
ISTM that you’re making a great argument that the defects claim is in the same category as the “10% of the brain” category. Let me explain.
To a layman, not well versed in neuroanatomy, the 10% thing has surface plausibility because of association between brain size and intelligence (smaller brained animals are dumber, in general), and because of the observed fact that some humans are massively smarter than others (e.g. Einstein, the paradigmatic case). Therefore, someone with the same size brain who’s only “normal” in IQ compared to Einstein must not be using all of that grey matter.
Of course, as soon as you learn more of what we actually know about how the brain works, for instance the results on modularity, the way simulated neural networks perform their functions, and so on—then the claim loses its plausibility, as you start asking which 90% we’re supposed not to be using, and so on.
Similarly, someone with a poor understanding of “defects” assumes that they are essentially physical in nature: they are like a crack in cement, and software seems like layer upon layer of cement, so that if you need to reach back to repair a crack after it’s been laid over, that’s obviously harder to fix.
But software defects are nothing like defects in physical materials. The layers of which software is built are all equally accessible, and software doesn’t crack or wear out. The problem is a lot more like writing a novel in which a heroine is dark-haired, complete with lots of subtle allusions or maybe puns referencing that hair color, and then deciding that she is blonde after all.
As you observe, the cost of fixing a defect is not a single category, but in fact decomposes in many costs which have fuzzy boundaries:
the cost of observing the erroneous behaviour in the first place (i.e. testing, whether a tester does it or a user)
the cost of locating the mistake in the code
the cost of devising an appropriate modification
the cost of changing the rest of the software to reflect the modification
the economic consequences of having released the defect to the field
the economic consequences of needing a new release
all other costs (I’m sure I’m missing some)
These costs are going to vary greatly according to the particular context. The cost of testing depends on the type of testing, and each type of testing catches different types of bugs. The cost of releasing new versions is very high for embedded software, very low for Web sites. The cost of poor quality is generally low in things like games, because nobody’s going to ask for their money back if Lara Croft’s guns pass through a wall or two; but it can be very high in automated trading software (I’ve personally touched software that had cost its owners millions in bug-caused bad trades). Some huge security defects go undetected for a long time, causing zero damage until they are found (look up the 2008 Debian bug).
The one thing that we know (or strongly suspect) from experience is always monotonically increasing as we add more code is “the cost of changing the rest of the software to reflect the modification”. This increase applies whatever the change being made, which is why the “cost of change curve” is plausible. (The funny part of the story is that there never was a “cost of change curve”, it’s all a misunderstanding; the ebook tells the whole story.)
Of course, someone who is a) sophisticated enough to understand the decomposition and b) educated enough to have read about the claim is likely to be a programmer, which means that by the availability heuristic they’re likely to think that the cost they know best is what dominates the entire economic impact of defects.
In fact, this is very unlikely to be the case in general.
And in fact, the one case where I have seen a somewhat credible study with detailed data (the Hughes Aircraft study), the data went counter to the standard exponential curve: it was expensive to fix a defect during the coding phase, but the (average per defect) cost then went down.
I don’t think this is quite true. For instance, a few years ago, I traced a bug in my application down to an issue in how the Java Virtual Machine does JIT compiling, which caused subtle differences in a numerical algorithm between when the application started up, and when it had warmed up enough that certain functions were JIT compiled. Almost certainly, the correct fix would have been to correct the JVM so that the results were exactly the same in all cases.
But, of course, the JVM was nowhere near as “accessible” as the library containing the bug—almost everyone relies on a prebuilt version of the JVM, and it is rather difficult to build. Also, it’s written in a different and less friendly language: C++. Of course, this assumes that we are using a free/open source JVM (as it happens, we are); the situation would be even worse if we had to rely on a proprietary VM. And it assumes that all of our users would have been willing to use a custom JVM until a fixed version of the mainline JVM were released.
Another possibility would have been to add a compile-time option to the library containing that algorithm, so that that particular function would either always be JIT-compiled, or would never be. That’s pretty straightforward—as it happens, a different division of my company employs some of that library’s authors. But the authors didn’t consider it a worthwhile thing to do. So now we could maintain a fork of the library forever, or we could fix the bug somewhere else. Again, of course, this relied on the library being open; with a typical proprietary library, there would have been no recourse.
Needless to say, the top layer, the application, was the easiest thing to change, and so that’s what changed.
Neither lower-level would have negatively impacted other library users (well, maybe turning off JIT on this function might have, but turning it always-on wouldn’t). So I do think there is, in some sense, a difference in accessibility between layers which is not just caused by the interdependence problem. We genuinely do treat lower layers as foundational, because it makes it easier to develop, distribute, and collaborate on software. So I’m not sure that a construction analogy is entirely inappropriate here.
Good observations—but note that these criteria for “accessible” (and the consequences you discuss) are socio-political in nature, rather than physical: the JVM is the result of someone else’s analysis, design, programming, testing etc. - and your decision to use it is not part of the software life-cycle depicted in the diagram.
A theory which attempted to account for such differences would find its ontology invaded with notions of copyright, software licensing, organizational divisions and the like—the SDLC would no longer be sufficient.
Some of them are socio-political, but I think others are intellectual. That is, I understand the internals of my program well, the library somewhat, and the JVM barely at all. And this would probably be close to accurate even if I had written the whole stack myself, since I would have written the JVM the longest time ago, the library more recently, and the program last. Buildings are built of stuff; programs are built of thoughts. That some information is more accessible because you have used it more recently is a fact about the brain rather than about software. But brains (and organizations) are all we have to build software with. So I think any methodology that does not account for these things must be incomplete.
There you have, in a nutshell, the problem with software engineering as a formal discipline: its stubborn refusal to admit the above, in the face of determined pushes to do so from the likes of Bill Curtis (who’s been refused a Wikipedia entry because he’s less notable than any number of porn stars) or Jerry Weinberg.
Dijkstra’s view was that the limitations of the human mind are precisely the reason that software must be treated as mathematics and developed with mathematical rigour.
Note that this is the exact opposite of using the native architecture.
That depends on where the mathematics is done. Dijkstra’s and Hoare’s vision of programmers proving their own code correct with pencil and paper is unworkable. People cannot reliably do any sort of formal manipulation on paper, not even something as simple as making an exact copy of a document. The method can be exhibited on paper for toy examples, but everything works for toy examples. So what to do?
Compare the method of writing machine code by developing a method of translating a higher-level language into machine code. This can be exhibited on paper for toy examples, but of course that is only for didactic purposes, and one writes a compiling program to actually do that work in production. This reduces the work of writing programs in machine code to the task of writing just one program in machine code, the compiler, and by bootstrapping techniques that can be reduced even further. The amount of mathematics carried out on the human side of the interface is greatly reduced.
Similarly, proving things about programs has to be done automatically, or there’s no point. We have to prove things about programs, because computing hardware and software is mathematics, whether we wish it to be or not. Software engineering is precisely the problem of how human beings, who cannot reliably do mathematics, can reliably instruct a mathematical machine to do what we want it to with mathematical reliability.
I don’t have any solutions, it just seems to me that this is how to describe the problem. How do we interface thought-stuff with machine-stuff?
Wow. An actual exception to “The compiler is never wrong!”
In this case, it’s not clear that the compiler was really wrong. The results of a floating point calculation differed by a tiny amount, and it’s possible that either was acceptable (I don’t know how strict Java is about its floating point rules). The problem was that I was using the result as a hash key.
But later, I was able to make the JVM reliably dump core (in different memory locations each time). Unfortunately, it was only under extremely heavy load, and I was never able to build a reduced test case.
Compilers do get things wrong. You may be interested in John Regehr’s blog; he’s essentially throwing cleverly-chosen “random” input at C compilers (“fuzz-testing”). The results are similar to those for other programs that have never been fuzzed, i.e. embarrassing.
And yet, in practice, when something is wrong with your code, it’s always your own fault.
Well, your prior should be pretty high that it’s your fault, unless you also wrote the compiler :)
If you can do experiments to prove that there’s a compiler bug, you learn something. If you jump straight to the compiler bug explanation instead of looking for the bug in your own code, you are resisting education, and the probability that all you are doing is delaying the lesson is the probability that the compiler is working correctly. This should be >>50% of the time or you need a better compiler!
The difference here is not so much in where you guess the bug is, as in whether you do the experiment.
A very effective experiment is to take your program and chop out everything irrelevant until you have a short piece of code which demonstrates the bug. At this point, if it is a compiler bug, you have dense information to hand the compiler author; if it isn’t a compiler bug, you’re in a much better position to understand what’s wrong with your code.
However, one is often reluctant to apply this technique until one suspects a compiler bug, because it seems like a lot of work. And it is — but often less work than continuing to examine the bug with less radical tools, given that you’re in the position where the notion of compiler bugs crosses your mind.
This is the second time in this thread that the analogy of software design as fiction writing has appeared, and I really quite like it. If it’s not already popular, maybe it should be.
In my experience most user requirements documents are works of fantasy. It’s our job as programmers to drag the genre closer to science fiction. Software testing is more like gritty hard-boiled detective fiction.
It’s at least somewhat popular. I know that Paul Graham has often drawn comparisons between the two ideas.
You make me curious about your book, perhaps I’ll read it. Thanks for the extensive answer. Could’nt agree more with what you’re saying. I can see why this ‘cost of change curve’ actually might not exist at all.
Made me wonder, I recently found a graph by Sommerville telling the exact story about these cost of change. I wonder what its source is for that graph .. ;)
I’m interested in your source for that graph.
Googling a bit for stuff by Sommerville, I come across a pie chart for “distribution of maintenance effort” which has all the hallmarks of a software engineering meme: old study, derived from a survey (such self-reports are often unreliable owing to selection bias and measurement bias), but still held to be current and generally applicable and cited in many books even though more recent research casts doubt on it.
Here’s a neat quote from the linked paper (LST is the old study):
I love it that 10% of managers can provide a survey response based on “no data”. :)
I’ve read the paper you refer to, very interesting data indeed. The quote is one of five possible explenations of why the results differ so much, but it certainly is a good possibility.
This post sparked my interest/doubt knob for now. I will question more ‘facts’ in the SE world from now on.
About sommerville: Sommerville website: http://www.comp.lancs.ac.uk/computing/resources/IanS/
The book I refer to: http://www.comp.lancs.ac.uk/computing/resources/IanS/SE8/index.html
You can download presentations of his chapters here: http://www.comp.lancs.ac.uk/computing/resources/IanS/SE8/Presentations/index.html
I have based my findings on the presentations now, since I haven’t got the book nearby. You can look them up yourself (download the chapters from the above link).
Chapter 7 says:
Chapter 21, refers to Software Maintanance, claiming (might need to verify this as well? ;)) :
Because I don’t have the book nearby I cannot tell for certain where it was stated. But I was pretty certain it was stated in that book.
Far more than 10% of managers do that routinely. The interesting thing is that as many as 10% admitted it.