My reasons for believing the 10x hypothesis are mostly anecdotal.
Do you see it as a testable hypothesis though, as opposed to an applause light calling out the programming profession as one where remarkable individuals are to be found?
I’m not sure that the tasks there are large enough … they are program maintenance tasks
You said earlier that a great programmer is good at all types of programming tasks, and program maintenance certainly is a programming task. Why the reversal?
Anyway, suppose you’re correct and there are some experimental conditions which make for a poor test of 10x. Then we need to list all such exclusion criteria prior to the experiment, not come up with them a posteriori—or we’ll be suspected of excluding the experimental results we don’t like.
My impression is that top programmers achieve their productivity mostly by being better at the design and debugging tasks … they design so that they need less code
Now this sounds as if you’re defining “productivity” in such a way that it has less to do with “rate of output”. You’ve just ruled out, a priori, any experimental setup in which you hand programmers a fixed design and measure the time taken to implement it, for instance.
At this point ISTM we still have made surprisingly little headway on the two questions at hand:
what kind of claim is the 10x claim—is it a testable hypothesis, and if not, how do we turn it into one
what kind of experimental setup will give us a way to check whether 10x is indeed favored among credible alternatives
I believe it can be turned into one. For example, as stated, it doesn’t take into account sample or population size. The reductio (N=2) is that it seems to claim the faster of two programmers will be 10x as fast as the slower. There is also a need to clarify and delimit what is meant by task.
You said earlier that a great programmer is good at all types of programming tasks, and program maintenance certainly is a programming task. Why the reversal?
Because you and I meant different things by task. (I meant different types of systems—compilers vs financial vs telephone switching systems for example.) Typing and attending meetings are also programming tasks, but I wouldn’t select them out for measurement and exclude other, more significant tasks when trying to test the 10x hypothesis.
Now this sounds as if you’re defining “productivity” in such a way that it has less to do with “rate of output”. You’ve just ruled out, a priori, any experimental setup in which you hand programmers a fixed design and measure the time taken to implement it, for instance.
Yes, I have. And I think we are wasting time here. It is easy to refute a scientific hypothesis by uncharitably misinterpreting it so that it cannot possibly be true. So I’m sure you will succeed in doing so without my help.
It is easy to refute a scientific hypothesis by uncharitably misinterpreting it so that it cannot possibly be true.
Where specifically have I done that? (Is it the “applause light” part? Do you think it obviously false that the thesis serves as an applause light?)
And I think we are wasting time here.
Are you tapping out? This is frustrating as hell. Crocker’s Rules, dammit—feel free to call me an idiot, but please point out where I’m being one!
Without outside help I can certainly go on doubting—holding off on believing what others seem to believe. But I want something more—I want to form positive knowledge. (As one fictional rationalist would have it, “My bottom line is not yet written. I will figure out how to test the magical strength of Muggleborns, and the magical strength of purebloods. If my tests tell me that Muggleborns are weaker, I will believe they are weaker. If my tests tell me that Muggleborns are stronger, I will believe they are stronger. Knowing this and other truths, I will gain some measure of power.”)
For example, as stated, it doesn’t take into account sample or population size.
Yeah, good catch. The 10x ratio is supposed to hold for workgroup-sized samples (10 to 20). What the source population is, that’s less clearly defined. A 1983 quote from Mills refers to “programmers certified by their industrial position and pay”, and we could go with that: anyone who gets full time or better compensation for writing code and whose job description says “programmer” or a variation thereof.
We can add “how large is the programmer population” to our list of questions. A quick search turns up an estimate from Watts Humphrey of 3 million programmers in the US about ten years ago.
So let’s assume those parameters hold—population size of 3M and sample size of 10. Do we now have a testable hypothesis?
What is the math for finding out what distribution of “productivity” in the overall population gives rise to a typical 10x best-to-worst ratio when you take samples of that size? Is that even a useful line of inquiry?
Now this sounds as if you’re defining “productivity” in such a way that it has less to do with “rate of output”. You’ve just ruled out, a priori, any experimental setup in which you hand programmers a fixed design and measure the time taken to implement it, for instance.
I’m not sure whether you meant “design” to refer to e.g. internal API or overall program behavior, but they’re both relevant in the same way:
The important metric of “rate of output” is how fast a programmer can solve real-world problems. Not how fast they can write lines of code—LOC is a cost, not an output. Design is not a constant. If Alice implements feature X using 1 day and 100 LOC, and Bob implements X using 10 days and 500 LOC, then Alice was 10x as productive as Bob, and she achieved that productivity by writing less code.
I would also expect that even having a fixed specification of what the program should do would somewhat compress the range of observed productivities compared to what actually happens in the wild. Because translating a problem into a desired program behavior is itself part of the task of programming, and is one of the opportunities for good programmers to distinguish themselves by finding a more efficient design. Although it’s harder to design an experiment to test this part of the hypothesis.
Do you see it as a testable hypothesis though, as opposed to an applause light calling out the programming profession as one where remarkable individuals are to be found?
You said earlier that a great programmer is good at all types of programming tasks, and program maintenance certainly is a programming task. Why the reversal?
Anyway, suppose you’re correct and there are some experimental conditions which make for a poor test of 10x. Then we need to list all such exclusion criteria prior to the experiment, not come up with them a posteriori—or we’ll be suspected of excluding the experimental results we don’t like.
Now this sounds as if you’re defining “productivity” in such a way that it has less to do with “rate of output”. You’ve just ruled out, a priori, any experimental setup in which you hand programmers a fixed design and measure the time taken to implement it, for instance.
At this point ISTM we still have made surprisingly little headway on the two questions at hand:
what kind of claim is the 10x claim—is it a testable hypothesis, and if not, how do we turn it into one
what kind of experimental setup will give us a way to check whether 10x is indeed favored among credible alternatives
I believe it can be turned into one. For example, as stated, it doesn’t take into account sample or population size. The reductio (N=2) is that it seems to claim the faster of two programmers will be 10x as fast as the slower. There is also a need to clarify and delimit what is meant by task.
Because you and I meant different things by task. (I meant different types of systems—compilers vs financial vs telephone switching systems for example.) Typing and attending meetings are also programming tasks, but I wouldn’t select them out for measurement and exclude other, more significant tasks when trying to test the 10x hypothesis.
Yes, I have. And I think we are wasting time here. It is easy to refute a scientific hypothesis by uncharitably misinterpreting it so that it cannot possibly be true. So I’m sure you will succeed in doing so without my help.
Where specifically have I done that? (Is it the “applause light” part? Do you think it obviously false that the thesis serves as an applause light?)
Are you tapping out? This is frustrating as hell. Crocker’s Rules, dammit—feel free to call me an idiot, but please point out where I’m being one!
Without outside help I can certainly go on doubting—holding off on believing what others seem to believe. But I want something more—I want to form positive knowledge. (As one fictional rationalist would have it, “My bottom line is not yet written. I will figure out how to test the magical strength of Muggleborns, and the magical strength of purebloods. If my tests tell me that Muggleborns are weaker, I will believe they are weaker. If my tests tell me that Muggleborns are stronger, I will believe they are stronger. Knowing this and other truths, I will gain some measure of power.”)
Yeah, good catch. The 10x ratio is supposed to hold for workgroup-sized samples (10 to 20). What the source population is, that’s less clearly defined. A 1983 quote from Mills refers to “programmers certified by their industrial position and pay”, and we could go with that: anyone who gets full time or better compensation for writing code and whose job description says “programmer” or a variation thereof.
We can add “how large is the programmer population” to our list of questions. A quick search turns up an estimate from Watts Humphrey of 3 million programmers in the US about ten years ago.
So let’s assume those parameters hold—population size of 3M and sample size of 10. Do we now have a testable hypothesis?
What is the math for finding out what distribution of “productivity” in the overall population gives rise to a typical 10x best-to-worst ratio when you take samples of that size? Is that even a useful line of inquiry?
The misinterpretation that stood out to me was:
I’m not sure whether you meant “design” to refer to e.g. internal API or overall program behavior, but they’re both relevant in the same way:
The important metric of “rate of output” is how fast a programmer can solve real-world problems. Not how fast they can write lines of code—LOC is a cost, not an output. Design is not a constant. If Alice implements feature X using 1 day and 100 LOC, and Bob implements X using 10 days and 500 LOC, then Alice was 10x as productive as Bob, and she achieved that productivity by writing less code.
I would also expect that even having a fixed specification of what the program should do would somewhat compress the range of observed productivities compared to what actually happens in the wild. Because translating a problem into a desired program behavior is itself part of the task of programming, and is one of the opportunities for good programmers to distinguish themselves by finding a more efficient design. Although it’s harder to design an experiment to test this part of the hypothesis.
Yes.