TDD is generally a good anti-akrasia hack—you spend more of your time in near mode doing one-more-thing-after-another (with a little squirt of pleasure on each GREEN), and less in far mode thinking about architecture (and what you’re going to have for lunch… and how messy the kitchen is… and…). (And then, as if by an invisible hand, your architecture ends up being good anyway.)
Sure. Suppose you’ve just coded up a method. Your favored hypothesis is that your code is correct. Thus, you will find it harder to think of inputs likely to disconfirm this hypothesis than to think of inputs likely to confirm it.
Wason’s 2-4-6 test provides a good illustration of this. You can almost directly map the hypothesis most people come up with to a method that returns “true” if the numbers provided conform to a certain rule. A unit test is a sample input that should cause the method to return false, which you could then check against the actual output of the Wason “oracle” (the actual criterion for being an acceptable triple).
Most people think more readily of confirmatory tests, that is, input data which conforms to their favored hypothesis. This will apply if you have already written the method.
This was noticed in the late 70′s by testing authority Glenford Myers who derived from it the (misleading and harmful) edict “developers should never test their own code”.
However, if you have to write the test first, you will avoid confirmation bias. You are less likely to harbor preconceptions as to the rule to be implemented, in fact you are actively thinking about test inputs that will invalidate the implementation, whatever that happens to be.
However, if you have to write the test first, you will avoid confirmation bias. You are less likely to harbor preconceptions as to the rule to be implemented, in fact you are actively thinking about test inputs that will invalidate the implementation, whatever that happens to be.
I don’t see any theoretical reason or mechanism why writing the tests first encourages negative tests. Is this supposed to be convincing, or are you making an empirical claim?
DSimon proposes a mechanism:
You’re saying that false positive tests are weeded out in TDD because the implementation isn’t allowed to have any code to raise errors or return negative states until there’s first a test checking for those errors/states.
This is plausible, but I still don’t find it convincing. In fact, it seems close to the claim “it’s easier to learn to think of errors as positive features, requiring positive tests, than it is to learn to write negative tests,” which doesn’t really distinguish between writing tests before and after.
Let me propose a hypothesis: perhaps it is easier to learn to write negative tests when switching to TDD because it’s easier to adopt habits while making large overhauls to behavior.
Hmm...Going back to the original quote,
You are less likely to harbor preconceptions as to the rule to be implemented
is a good point, especially for, eg, determining which are the edge cases. So I shouldn’t say no theory or mechanism, but I’m not convinced.
Is this format clear? It’s language-agnostic, so you could implement that test in Ruby or whatever.
We want to see this test fail. So we have to supply an implementation that is deliberately broken—a simple way to do that is to return an empty string, or perhaps return the exact same string that was passed as input—there are many ways to be broken.
At this point an experienced TDDer will notice this arbitrariness, think “we could have started with an even simpler test case”, and write the following:
We’ve narrowed it down to only one non-arbitrary way to make the test fail: return the empty string. And to make the test pass we’ll return the original input.
See how this works? Because I’m not yet thinking about my implementation, but thinking about how my tests pin down the correct implementation I’m free to come up with non-confirmatory examples. My tests are drawing a box around something, but I’m not yet concerned with the contents of the box.
Now we can go back to the simple variable substitution. An important part of TDD is that one failing test does not allow you, yet, to write the fully general code of your algorithm. You’re supposed to write the simplest possible code that causes all tests to pass, erring on the side of “simplistic” code.
So for instance you’d write a small method that did more or less this:
return "blahblahblah" if template.contains("{")
else return "blahblah"
Many of my colleagues make a long face when they realize this is what TDD entails. “Why would you write such a deliberately stupid implementation?” Precisely to keep thinking about better tests, and to hold off on thinking about the fully general implementation.
Which are important non-confirmatory test cases. And I want to see these tests fail, because they reveal important differences between a sophisticated enough implementation and my “naive” first attempt.
I will also probably be thinking that even this “basic” capability is starting to look like a fairly complex bit, a complexity which wasn’t obvious until I started thinking about all these non-confirmatory test cases. At this point if I were coding this for real I would start breaking down the problem, and maybe narrow the problem down to tokenizing the template:
(Now I’m not testing the actual output of the program, of course, but an intermediate representation. That’s OK, since those are unit tests: they’re allowed to examine internals of the code.)
At this point the only line of code I have written is a deliberately simplistic implementation of my expansion algorithm, and I have already gotten a half-dozen important tests out of my thinking.
This is a good explanation. I have one point of difference, though:
input: blah{{$foo}}blah
context: foo=blah
expectation: blahblahblah
return “blahblahblah” if template.contains(“{”) else return “blahblah”
This implementation has copy&pasted magic values from the test. I’ve usually thought of these kinds of intermediate implementations as being side tracks because AIUI they are necessarily weeded out right away by the refactor phase of each cycle.
So, my deliberately-stupid implementation might’ve been:
Which is more complex than the one you suggested, but still I think the least complex one that makes the test pass without copy & paste.
Then as with your example, this would’ve led to tests to make sure the right substitution variable was being matched to the right key, in which more than one substitution variable is supplied, in which substitutions are made for variables that aren’t in the context, and so on....
(By the way, how did you get that nice fixed-width font?)
but still I think the least complex one that makes the test pass without copy & paste.
He didn’t say “without copy and paste”.
Come to think of it, “simplest” varies person to person. In one metric the “simplest that could work” would just be a huge switch statement mapping input for a given test to output for the same test...
(By the way, how did you get that nice fixed-width font?)
Just copying the expected value from the test into the body of the implementation will make the test go green, but it’s completely un-DRY, so you’d have to rip it out and replace it with a non-c&p implementation during the necessary refactor phase anyways.
So, TDD as I learned it discourages c&p from the test. However, Morendil, now you’ve got me interested in talking about the possible benefits of a c&p-permitted approach: for example, I can see how it might force the programmer to write more sophisticated tests. Though on the other hand, it might also force them to spend a lot more time on the tests but for only minor additional benefit.
On the other hand, if you write the code first and then the test, you’ll have a better idea of how to make the code break. If you can put yourself in a sufficiently ruthless frame of mind, I think this is better than writing the test first.
Okay, I think I see where you’re going, but let me double-check:
You’re saying that false positive tests are weeded out in TDD because the implementation isn’t allowed to have any code to raise errors or return negative states until there’s first a test checking for those errors/states.
So, if an everythingWorkingOkay() function always returns true, it wouldn’t pass the test that breaks things and then makes sure it return false. We know that test exists because ideally, for TDD, that test must be written before any code can be added to the function that is intended to return false at all.
Whereas, if the programmer writes the code first and the test second, they might well forget to test for negative output, since that possibility won’t come to mind as readily.
TDD is a tactic against confirmation bias—it feels like that should go in there somewhere.
Also good for getting regression tests done, i.e. tactic against akrasia.
TDD is generally a good anti-akrasia hack—you spend more of your time in near mode doing one-more-thing-after-another (with a little squirt of pleasure on each GREEN), and less in far mode thinking about architecture (and what you’re going to have for lunch… and how messy the kitchen is… and…).
(And then, as if by an invisible hand, your architecture ends up being good anyway.)
I’m not quite sure where confirmation bias comes into it. Can you go into more detail about this?
Sure. Suppose you’ve just coded up a method. Your favored hypothesis is that your code is correct. Thus, you will find it harder to think of inputs likely to disconfirm this hypothesis than to think of inputs likely to confirm it.
Wason’s 2-4-6 test provides a good illustration of this. You can almost directly map the hypothesis most people come up with to a method that returns “true” if the numbers provided conform to a certain rule. A unit test is a sample input that should cause the method to return false, which you could then check against the actual output of the Wason “oracle” (the actual criterion for being an acceptable triple).
Most people think more readily of confirmatory tests, that is, input data which conforms to their favored hypothesis. This will apply if you have already written the method.
This was noticed in the late 70′s by testing authority Glenford Myers who derived from it the (misleading and harmful) edict “developers should never test their own code”.
However, if you have to write the test first, you will avoid confirmation bias. You are less likely to harbor preconceptions as to the rule to be implemented, in fact you are actively thinking about test inputs that will invalidate the implementation, whatever that happens to be.
Does that help?
I don’t see any theoretical reason or mechanism why writing the tests first encourages negative tests. Is this supposed to be convincing, or are you making an empirical claim?
DSimon proposes a mechanism:
This is plausible, but I still don’t find it convincing. In fact, it seems close to the claim “it’s easier to learn to think of errors as positive features, requiring positive tests, than it is to learn to write negative tests,” which doesn’t really distinguish between writing tests before and after.
Let me propose a hypothesis: perhaps it is easier to learn to write negative tests when switching to TDD because it’s easier to adopt habits while making large overhauls to behavior.
Hmm...Going back to the original quote,
is a good point, especially for, eg, determining which are the edge cases. So I shouldn’t say no theory or mechanism, but I’m not convinced.
Let’s use a more concrete example. Since I’ve recently worked in that domain, say we’re implementing a template language like mustache.
We’re starting from a blank slate, so our first test might be a test for the basic capability of variable substitution:
Is this format clear? It’s language-agnostic, so you could implement that test in Ruby or whatever.
We want to see this test fail. So we have to supply an implementation that is deliberately broken—a simple way to do that is to return an empty string, or perhaps return the exact same string that was passed as input—there are many ways to be broken.
At this point an experienced TDDer will notice this arbitrariness, think “we could have started with an even simpler test case”, and write the following:
We’ve narrowed it down to only one non-arbitrary way to make the test fail: return the empty string. And to make the test pass we’ll return the original input.
See how this works? Because I’m not yet thinking about my implementation, but thinking about how my tests pin down the correct implementation I’m free to come up with non-confirmatory examples. My tests are drawing a box around something, but I’m not yet concerned with the contents of the box.
Now we can go back to the simple variable substitution. An important part of TDD is that one failing test does not allow you, yet, to write the fully general code of your algorithm. You’re supposed to write the simplest possible code that causes all tests to pass, erring on the side of “simplistic” code.
So for instance you’d write a small method that did more or less this:
Many of my colleagues make a long face when they realize this is what TDD entails. “Why would you write such a deliberately stupid implementation?” Precisely to keep thinking about better tests, and to hold off on thinking about the fully general implementation.
So now I may want to add the following:
And maybe this:
Which are important non-confirmatory test cases. And I want to see these tests fail, because they reveal important differences between a sophisticated enough implementation and my “naive” first attempt.
I will also probably be thinking that even this “basic” capability is starting to look like a fairly complex bit, a complexity which wasn’t obvious until I started thinking about all these non-confirmatory test cases. At this point if I were coding this for real I would start breaking down the problem, and maybe narrow the problem down to tokenizing the template:
and
(Now I’m not testing the actual output of the program, of course, but an intermediate representation. That’s OK, since those are unit tests: they’re allowed to examine internals of the code.)
At this point the only line of code I have written is a deliberately simplistic implementation of my expansion algorithm, and I have already gotten a half-dozen important tests out of my thinking.
This is a good explanation. I have one point of difference, though:
This implementation has copy&pasted magic values from the test. I’ve usually thought of these kinds of intermediate implementations as being side tracks because AIUI they are necessarily weeded out right away by the refactor phase of each cycle.
So, my deliberately-stupid implementation might’ve been:
def substitute(input, context): return input.sub(/\${{.+?}}/, context.values.first)
Which is more complex than the one you suggested, but still I think the least complex one that makes the test pass without copy & paste.
Then as with your example, this would’ve led to tests to make sure the right substitution variable was being matched to the right key, in which more than one substitution variable is supplied, in which substitutions are made for variables that aren’t in the context, and so on....
(By the way, how did you get that nice fixed-width font?)
He didn’t say “without copy and paste”.
Come to think of it, “simplest” varies person to person. In one metric the “simplest that could work” would just be a huge switch statement mapping input for a given test to output for the same test...
http://wiki.lesswrong.com/wiki/Comment_formatting
Enclose with
backticks
for inline code, andJust copying the expected value from the test into the body of the implementation will make the test go green, but it’s completely un-DRY, so you’d have to rip it out and replace it with a non-c&p implementation during the necessary refactor phase anyways.
Wikipedia agrees with me on this, and they cite to “Test-Driven Development by Example” by Kent Beck, the original TDD guy.
So, TDD as I learned it discourages c&p from the test. However, Morendil, now you’ve got me interested in talking about the possible benefits of a c&p-permitted approach: for example, I can see how it might force the programmer to write more sophisticated tests. Though on the other hand, it might also force them to spend a lot more time on the tests but for only minor additional benefit.
On the other hand, if you write the code first and then the test, you’ll have a better idea of how to make the code break. If you can put yourself in a sufficiently ruthless frame of mind, I think this is better than writing the test first.
Okay, I think I see where you’re going, but let me double-check:
You’re saying that false positive tests are weeded out in TDD because the implementation isn’t allowed to have any code to raise errors or return negative states until there’s first a test checking for those errors/states.
So, if an everythingWorkingOkay() function always returns true, it wouldn’t pass the test that breaks things and then makes sure it return false. We know that test exists because ideally, for TDD, that test must be written before any code can be added to the function that is intended to return false at all.
Whereas, if the programmer writes the code first and the test second, they might well forget to test for negative output, since that possibility won’t come to mind as readily.
See this other reply for more detail.