Is it really valid to conclude that software engineering is diseased based on one propagating mistake? Could you provide other examples of flawed scholarship in the field? (I’m not saying I disagree, but I don’t think your argument is particularly convincing.)
I’m a regular reader of Jorge and Greg’s blog, and even had a very modest contribution there. It’s a wonderful effort.
“Making Software” is well worth reading overall, and I applaud the intention, but it’s not the Bible. When you read it with a critical mind, you notice that parts of it are horrible, for instance the chapter on “10x software developers”.
Reading that chapter was in fact largely responsible for my starting (about a year ago) to really dig into some of the most-cited studies in our field and gradually realizing that it’s permeated with poorly supported folklore.
In 2009, Greg Wilson wrote that nearly all of the 10x “studies” would most likely be rejected if submitted to an academic publication. The 10x notion is another example of propagation of extremely unconvincing claims, that nevertheless have had a large influence in shaping the discipline’s underlying assumptions.
But Greg had no problem including the 10x chapter which rests mostly on these studies, when he became the editor of “Making Software”. As you can see from Greg’s frosty tone in that link, we don’t see eye to eye on this issue. I’m partially to blame for that, insofar as one of my early articles on the topic carelessly implied that the author of the chapter in question had “cheated” by using these citations. (I do think the citations are bogus, but I now believe that they were used in good faith—which worries me even more than cheating would.)
Another interesting example is the “Making Software” chapter on the NASA Software Engineering Laboratory (SEL), which trumpets the lab as “a vibrant testbed for empirical research”. Vibrant? In fact by the author’s own admission the SEL was all but shut down by NASA eight years earlier, but this isn’t mentioned in the chapter at all.
It’s well known on Less Wrong that I’m not a fan of “status” and “signaling” explanations for human behaviour. But this is one case where it’s tempting… The book is one instance of a recent trend in the discipline, where people want to be seen to call for better empirical support for claimed findings, and at least pay overt homage to “evidence based software engineering”. The problem is that actual behaviours belie this intention, as in the case of not providing information about the NASA SEL that no reader would fail to see as significant—the administrative failure of the SEL is definitely evidence of some kind, and Basili thought it significant enough to write an article about its “rise and fall”.
Other examples of propagation of poorly supported claims, that I discuss in my ebook, are the “Cone of Uncertainty” and the “Software Crisis”. I regularly stumble across more—it’s sometimes a strange feeling to discover that so much that I thought was solid history or solid fact is in fact so much thin air. I sincerely hope that I’ll eventually find some part of the discipline’s foundations that doesn’t feel like quicksand; a place to stand.
What would you suggest if asked for a major, well-supported result in software engineering?
Thanks for taking the time to reply thoughtfully. That was some good reading, especially for a non-expert like me. Here are my thoughts after taking the time to read through all of those comment threads and your original blog post. I’ll admit that I haven’t read the original McConnell chapter yet, so keep that in mind. Also, keep in mind that I’m trying to improve the quality of this discussion, not spark an argument we’ll regret. This is a topic dear to my heart and I’m really glad it ended up on LW.
What would you suggest if asked for a major, well-supported result in software engineering?
Based on Steve McConnell’s blog post (and the ensuing comment thread), I think the order-of-magnitude claim is reasonably well-supported—there are a handful of mediocre studies that triangulate to a reasonable belief in the order-of-magnitude claim. In none of those comment threads are you presenting a solid argument for the claim being not well-supported. Instead, you are mostly putting forth the claim that the citations were sloppy and “unfair.” You seem to be somewhat correct—which Steve acknowledged—but I think you’re overreaching with your conclusions.
The book is one instance of a recent trend in the discipline, where people want to be seen to call for better
empirical support for claimed findings, and at least pay overt homage to “evidence based software engineering”.
We could look at your own arguments in the same light. In all those long comment threads, you failed to engage in a useful discussion, relying on argumentative “cover fire” that distracted from the main point of discussion (i.e. instead of burying your claims in citations, you’re burying your claims in unrelated claims.) You claim that “The software profession has a problem, widely recognized but which nobody seems willing to do anything about,” despite that you acknowledge that Wilson et al are indeed doing quite a bit about it. This looks a lot like narrative/confirmation bias, where you’re a detective unearthing a juicy conspiracy. Many of your points are valid, and I’m really, really glad for your more productive contributions to the discussion, but you must admit that you are being stubborn about the McConnell piece, no?
Regarding Greg Wilson’s frosty tone, I don’t think that has anything to do with the fact that you disagree about what constitutes good evidence. He’s very clearly annoyed that your article is accusing Steve McConnell of “pulling a fast one.” But really, the disagreement is about your rather extreme take on what evidence we can consider.
Considering how consistently you complained about the academic paywalls, it’s strange that you’re putting the substance of your piece behind your own paywall. This post is a good example of a LessWrong post that isn’t a thinly veiled advert for a book.
I’m not disagreeing altogether or trying to attack you, but I do think you have pretty extreme conclusions. Your views of the “10x” chapter and the “SEL” chapter are not enough to conclude that the broad discipline is “diseased.” I think your suggestion that Making Software is only paying “overt homage” to real scholarly discipline is somewhat silly and the two reasons you cite aren’t enough to damn it so thoroughly. Moreover, your criticism should (and does, unintentionally) augment and refine Making Software, instead of throwing it away completely because of a few tedious criticisms.
I acknowledge the mind-killing potential of the 10x claim, and cannot rule out that I’m the one being mind-killed.
This sub-thread is largely about discussions that took place at other times in other venues: I prefer not to reply here, but email me if you’re interested in continuing this specific discussion. I disagree, for the most part, with your conclusions.
The productivity of programmers in an institution varies by a factor of infinity. It is hard to ensure that a programmer is in fact doing any useful work, and people in general are clever enough to come up with ways to avoid work while retaining the pay. Consequently there’s the people who don’t do anything, or do absolute bare minimum which is often counter productive. The very difficulty in measuring productivity inevitably (humans trying to conserve their effort) leads to immense variability in productivity.
Resulting in average of say 3 lines of code per day and majority of projects failing. We can all agree that for all but the most analysis & verification—heavy software, 3 lines of code per day per programmer is crap (and this includes lines such as lone “{” ), and there is no counter argument that it in fact works, with the huge (unseen) fraction of projects failing to deliver anything, and having the productivity of 0 lines of code that’s of any use. The lines of code are awful metric. They are still good enough to see several orders of magnitude issues.
It has been said that physicists stand on one another’s shoulders. If this is the case, then programmers stand on one another’s toes, and software engineers dig each others’ graves.
If there is something that your data confirms, you want to reference somebody as a source to be seen fighting this problem.
Is it really valid to conclude that software engineering is diseased based on one propagating mistake? Could you provide other examples of flawed scholarship in the field? (I’m not saying I disagree, but I don’t think your argument is particularly convincing.)
Can you comment on Making Software by Andy Oram and Greg Wilson (Eds.)? What do you think of Jorge Aranda and Greg Wilson’s blog, It Will Never Work in Theory?
To anyone interested in the subject, I recommend Greg Wilson’s talk on the subject, which you can view here.
I’m a regular reader of Jorge and Greg’s blog, and even had a very modest contribution there. It’s a wonderful effort.
“Making Software” is well worth reading overall, and I applaud the intention, but it’s not the Bible. When you read it with a critical mind, you notice that parts of it are horrible, for instance the chapter on “10x software developers”.
Reading that chapter was in fact largely responsible for my starting (about a year ago) to really dig into some of the most-cited studies in our field and gradually realizing that it’s permeated with poorly supported folklore.
In 2009, Greg Wilson wrote that nearly all of the 10x “studies” would most likely be rejected if submitted to an academic publication. The 10x notion is another example of propagation of extremely unconvincing claims, that nevertheless have had a large influence in shaping the discipline’s underlying assumptions.
But Greg had no problem including the 10x chapter which rests mostly on these studies, when he became the editor of “Making Software”. As you can see from Greg’s frosty tone in that link, we don’t see eye to eye on this issue. I’m partially to blame for that, insofar as one of my early articles on the topic carelessly implied that the author of the chapter in question had “cheated” by using these citations. (I do think the citations are bogus, but I now believe that they were used in good faith—which worries me even more than cheating would.)
Another interesting example is the “Making Software” chapter on the NASA Software Engineering Laboratory (SEL), which trumpets the lab as “a vibrant testbed for empirical research”. Vibrant? In fact by the author’s own admission the SEL was all but shut down by NASA eight years earlier, but this isn’t mentioned in the chapter at all.
It’s well known on Less Wrong that I’m not a fan of “status” and “signaling” explanations for human behaviour. But this is one case where it’s tempting… The book is one instance of a recent trend in the discipline, where people want to be seen to call for better empirical support for claimed findings, and at least pay overt homage to “evidence based software engineering”. The problem is that actual behaviours belie this intention, as in the case of not providing information about the NASA SEL that no reader would fail to see as significant—the administrative failure of the SEL is definitely evidence of some kind, and Basili thought it significant enough to write an article about its “rise and fall”.
Other examples of propagation of poorly supported claims, that I discuss in my ebook, are the “Cone of Uncertainty” and the “Software Crisis”. I regularly stumble across more—it’s sometimes a strange feeling to discover that so much that I thought was solid history or solid fact is in fact so much thin air. I sincerely hope that I’ll eventually find some part of the discipline’s foundations that doesn’t feel like quicksand; a place to stand.
What would you suggest if asked for a major, well-supported result in software engineering?
Thanks for taking the time to reply thoughtfully. That was some good reading, especially for a non-expert like me. Here are my thoughts after taking the time to read through all of those comment threads and your original blog post. I’ll admit that I haven’t read the original McConnell chapter yet, so keep that in mind. Also, keep in mind that I’m trying to improve the quality of this discussion, not spark an argument we’ll regret. This is a topic dear to my heart and I’m really glad it ended up on LW.
Based on Steve McConnell’s blog post (and the ensuing comment thread), I think the order-of-magnitude claim is reasonably well-supported—there are a handful of mediocre studies that triangulate to a reasonable belief in the order-of-magnitude claim. In none of those comment threads are you presenting a solid argument for the claim being not well-supported. Instead, you are mostly putting forth the claim that the citations were sloppy and “unfair.” You seem to be somewhat correct—which Steve acknowledged—but I think you’re overreaching with your conclusions.
We could look at your own arguments in the same light. In all those long comment threads, you failed to engage in a useful discussion, relying on argumentative “cover fire” that distracted from the main point of discussion (i.e. instead of burying your claims in citations, you’re burying your claims in unrelated claims.) You claim that “The software profession has a problem, widely recognized but which nobody seems willing to do anything about,” despite that you acknowledge that Wilson et al are indeed doing quite a bit about it. This looks a lot like narrative/confirmation bias, where you’re a detective unearthing a juicy conspiracy. Many of your points are valid, and I’m really, really glad for your more productive contributions to the discussion, but you must admit that you are being stubborn about the McConnell piece, no?
Regarding Greg Wilson’s frosty tone, I don’t think that has anything to do with the fact that you disagree about what constitutes good evidence. He’s very clearly annoyed that your article is accusing Steve McConnell of “pulling a fast one.” But really, the disagreement is about your rather extreme take on what evidence we can consider.
Considering how consistently you complained about the academic paywalls, it’s strange that you’re putting the substance of your piece behind your own paywall. This post is a good example of a LessWrong post that isn’t a thinly veiled advert for a book.
I’m not disagreeing altogether or trying to attack you, but I do think you have pretty extreme conclusions. Your views of the “10x” chapter and the “SEL” chapter are not enough to conclude that the broad discipline is “diseased.” I think your suggestion that Making Software is only paying “overt homage” to real scholarly discipline is somewhat silly and the two reasons you cite aren’t enough to damn it so thoroughly. Moreover, your criticism should (and does, unintentionally) augment and refine Making Software, instead of throwing it away completely because of a few tedious criticisms.
I acknowledge the mind-killing potential of the 10x claim, and cannot rule out that I’m the one being mind-killed.
This sub-thread is largely about discussions that took place at other times in other venues: I prefer not to reply here, but email me if you’re interested in continuing this specific discussion. I disagree, for the most part, with your conclusions.
The productivity of programmers in an institution varies by a factor of infinity. It is hard to ensure that a programmer is in fact doing any useful work, and people in general are clever enough to come up with ways to avoid work while retaining the pay. Consequently there’s the people who don’t do anything, or do absolute bare minimum which is often counter productive. The very difficulty in measuring productivity inevitably (humans trying to conserve their effort) leads to immense variability in productivity.
Resulting in average of say 3 lines of code per day and majority of projects failing. We can all agree that for all but the most analysis & verification—heavy software, 3 lines of code per day per programmer is crap (and this includes lines such as lone “{” ), and there is no counter argument that it in fact works, with the huge (unseen) fraction of projects failing to deliver anything, and having the productivity of 0 lines of code that’s of any use. The lines of code are awful metric. They are still good enough to see several orders of magnitude issues.
And also leads to difficulty of talented new people getting started; I read a very interesting experimental economics paper using oDesk on this yesterday: http://www.onlinelabor.blogspot.com/2012/02/economics-of-cold-start-problem-in.html
How about based on the fact that the discipline relies on propagating result rather than reproducing them.
If there is something that your data confirms, you want to reference somebody as a source to be seen fighting this problem.
See the ebook referenced at the end of the post, for starters. Will say more later.