It’s not that costly if you do with university students:
Get two groups of 4 university students. One group is told “test early and often”. One group is told “test after the code is integrated”. For every bug they fix, measure the effort it is to fix it (by having them “sign a clock” for every task they do). Then, do analysis on when the bug was introduced (this seems easy post-fixing the bug, which is easy if they use something like Trac and SVN). All it takes is a month-long project that a group of 4 software engineering students can do. It seems like any university with a software engineering department can do it for the course-worth of one course. Seems to me it’s under $50K to fund?
But it can’t really be done the way you envision it. Variance in developer quality is high. Getting a meaningful result would require a lot more than 8 developers. And very few research groups can afford to run an experiment of that size—particularly since the usual experience in science is that you have to try the study a few times before you have the procedure right.
That would be cheap and simple, but wouldn’t give a meaningful answer for high-cost bugs, which don’t manifest in such small projects. Furthermore, with only eight people total, individual ability differences would overwhelmingly dominate all the other factors.
It’s not that costly if you do with university students: Get two groups of 4 university students. One group is told “test early and often”. One group is told “test after the code is integrated”. For every bug they fix, measure the effort it is to fix it (by having them “sign a clock” for every task they do). Then, do analysis on when the bug was introduced (this seems easy post-fixing the bug, which is easy if they use something like Trac and SVN). All it takes is a month-long project that a group of 4 software engineering students can do. It seems like any university with a software engineering department can do it for the course-worth of one course. Seems to me it’s under $50K to fund?
Yes, it would be nice to have such a study.
But it can’t really be done the way you envision it. Variance in developer quality is high. Getting a meaningful result would require a lot more than 8 developers. And very few research groups can afford to run an experiment of that size—particularly since the usual experience in science is that you have to try the study a few times before you have the procedure right.
That would be cheap and simple, but wouldn’t give a meaningful answer for high-cost bugs, which don’t manifest in such small projects. Furthermore, with only eight people total, individual ability differences would overwhelmingly dominate all the other factors.