Morendil comments on Coding Rationally—Test Driven Development

Morendil Oct 3, 2010, 1:19 PM
3 points

these practices solve only the very easy problems in these worlds, but still require you to twist your development process so you no longer know how to solve the harder ones

I’m not following the argument here. Explain how TDD causes you to no longer know how to solve the harder problems in some of these domains?

Also, I’m not sure I buy the “sweet spot” theory. Some techniques have a broad range of applicability rather than a sweet spot: they fail only in some corner cases. I suspect that having lots of focused unit tests is one such technique. And, given that TDD is a more realistic way to end up with lots of unit tests than test-last, I’d be tempted to conclude that TDD also has a broad range of applicability—only slightly narrower than having lots of unit tests.

Of course one big issue with this kind of debate is the almost complete lack of empirical research in these techniques. Anecdotally, I’ve heard reports of people using TDD to beneficial effect in all the domains mentioned.
- cousin_it Oct 3, 2010, 1:33 PM
  9 points
  Parent
  
  I’m not following the argument here. Explain how TDD causes you to no longer know how to solve the harder problems in some of these domains?
  
  What I’m saying is really simple. If you follow TDD strictly, you can’t even begin writing Google Maps without first writing a huge test harness that simulates mouse events or something similarly stupid. And the more you allow yourself to deviate from TDD, the faster you go.
  What links here?
  - Morendil's comment on Link: “When Science Goes Psychic” by Tesseract (Jan 8, 2011, 6:12 PM; 0 points)
  - Morendil Oct 3, 2010, 4:37 PM
    6 points
    Parent
    We may have different understandings of “TDD”, so I suggest tabooing the term. Can you address your argument above to the description that follows?
    
    The process I know as TDD consists of specifyfing the actual values of a precondition and a postcondition, prior to writing the third (command) portion of a Hoare triple. The “rules” of this process are
    
    I am only allowed to write a Command (C) after writing its pre- and post-conditions (P+Q)
    I must observe a postcondition failure before writing C
    I must write the simplest, “wrongest” possible C that satisfies P+Q
    I must next write the simplest P+Q that shows a deficiency in my code
    my algorithm should arise as a sequence of such Cs satisfying P+Qs
    my application code (and ideally test code) should be well-factored at all times
    
    If you follow this process strictly, you can’t even begin writing a huge test harness. The process enforces interleaved writing of application code and test code. As I’ve argued elsewhere, it tends to counter confirmation bias in testing, and it produces a comprehensive suite of unit tests as a by-product. It encourages separation of concerns which is widely regarded as a cornerstone of appropriate design.
    
    Empirically, my observations are that this process reliably results in higher developer productivity, by decreasing across the board the time between introducing a new defect and detecting that defect, which has a huge impact on the total time spent fixing defects. The exceptions to this rule are when developers are too unskilled to produce code that works by intention to start with, i.e. developers of the “tweak it until it seems to work” variety.
    - cousin_it Oct 3, 2010, 9:47 PM
      10 points
      Parent
      What you’re saying is too abstract, I can’t understand any of it. What would be the “preconditions and postconditions” for Google Maps? “The tiles must join seamlessly at the edges”? “When the user clicks and drags, all tiles move along with the cursor”? How do you write automated tests for such things?
      
      In a child comment wnoise says that “every bug that is found should have a unit test written”. For the record, I don’t agree with that either. Take this bug: “In Opera 10.2 the mousewheel delta comes in with the wrong sign, which breaks zooming.” (I can’t vouch for the exact version, but I do remember that Opera pulled this trick once upon a minor version update.) It’s a very typical bug, I get hundreds of those; but how do you write a test for that?
      
      You could say web development is “special” this way. Well, it isn’t. Ask game developers what their typical bugs look like. (Ever try writing a 3D terrain engine test-first?) Ask a Windows developer fighting with version hell. Honestly I’m at a loss for words. What kind of apps have you seen developed with TDD start to finish? Anything interesting?
      
      Maybe related: Ron Jeffries (well-known Extreme Programming guru) tries to write a Sudoku solver using TDD which results in a slow motion trainwreck: 1, 2, 3, 4, 5. Compare with Peter Norvig’s attempt, become enlightened.
      - Morendil Oct 4, 2010, 6:26 AM
        5 points
        Parent
        
        What would be the “preconditions and postconditions” for Google Maps? “The tiles must join seamlessly at the edges”?
        
        OK, suppose you are writing Google Maps, from scratch. Is the above the first thing you’re going to worry about?
        
        No, presumably you’re going to apply the usual strategy to tackle a big hairy problem: break it down into more manageable chunks, tackle each chunk in turn, recursing if you have to. Maps has subareas, like a) vector drawing of maps, b) zoomable display of satellite pictures, c) mapping informally specified adresses to GPS coordinates.
        
        So, suppose you decide to start with a), vector draw. Now you feel ready to write some code, maybe something that takes two X,Y pairs and interprets them as a road segment, drawing the road segment to a canvas.
        
        The “precondition” is just that, the fact of having two X,Y pairs that are spatially separated. And the “postcondition” is that the canvas should receive drawing commands to display something in the right line style for a road segment, against a background of the right color, at the right scale.
        
        Well that’s perfectly testable, and in fact testable without a sophisticated testing harness.
        
        My point is that if you feel you know enough about a given problem to write a couple lines of code that start solving it, then you have narrowed it down enough to also write a unit test. And so the claim that “TDD requires you to first write a huge test harness” is baseless.
        
        Take this bug: “In Opera 10.2...
        
        The way you tell this, it’s a defect originating with the Opera developers, not on your part. You may still want to document your special-casing this version of Opera with a workaround, and a unit test is a good way to document that, but the point of your doing TDD is to help your code be bug-free. Other people’s bugs are out of scope for TDD as a process.
        
        More generally, “software development as a whole is a big hairy mess” is also not a very good reason to doubt the principle of TDD. Yes we’re starting from a mess, but that’s not a valid reason to give up on cleaning the mess.
        
        What kind of apps have you seen developed with TDD start to finish? Anything interesting?
        
        Things like a content management system or a trading backend, to start with my own experience. Or, that I’ve heard of, a tiny little IDE called Eclipse? Not sure if that warrants “interesting”. ;)
        
        Maybe related
        
        Dude, “Ron Jeffries once had a bad hair day” is a spectacularly lame argument from which to try and derive general conclusions about any software development technique. I expect better of you.
        cousin_it Oct 4, 2010, 10:27 AM
        1 point
        Parent
        
        OK, suppose you are writing Google Maps, from scratch. Is the above the first thing you’re going to worry about?
        
        Actually yes—you usually start with drawing a tiled raster map, it’s way easier than a vector one. A raster map is just a bunch of IMG tags side by side. But let’s go with your scenario of vector drawing, it will serve just fine and maybe I’ll learn something:
        
        And the “postcondition” is that the canvas should receive drawing commands to display something in the right line style for a road segment, against a background of the right color, at the right scale.
        
        So the test says “calling this code must result in this exact sequence of calls to the underlying API”? Hah. I have a method like this in one of my maps, but as far as I can remember, every time I tweaked it (e.g. to optimize the use of the different canvas abstractions in different browsers—SVG, VML, Canvas) or fixed bugs in it (like MSIE drawing the line off by one pixel when image dimensions have certain values) - I always ended up changing the sequence of API calls, so I’d need to edit the test every time which kinda defeats the purpose. Basically, this kind of postcondition is lousy. If I could express a postcondition in terms of what actually happens on the screen, that would be helpful, but I can’t. What does TDD give me here, apart from wasted effort?
        
        Or, that I’ve heard of, a tiny little IDE called Eclipse?
        
        Eclipse was developed test-first? I never heard of that and that would be very interesting. Do you have any references?
        Morendil Oct 4, 2010, 11:25 AM
        5 points
        Parent
        Gamma described the Eclipse “customized Agile” process in a 2005 keynote speech (pdf). He doesn’t explicitly call it test-first, but he emphasizes both the huge number of unit tests and their being written closely interleaved with the production code.
        Morendil Oct 4, 2010, 11:17 AM
        3 points
        Parent
        
        Eclipse was developed test-first? I never heard of that and that would be very interesting. Do you have any references?
        
        Look for write-ups of Erich Gamma’s work; he’s the coauthor with Kent Beck of the original JUnit and one of three surviving members of the Gang of Four. Excerpt from this interview:
        
        Erich Gamma was the original lead and visionary force behind Eclipse’s Java development environment (JDT). He still sits on the Project Management Committee for the Eclipse project. If you’ve never browsed the Eclipse Platform source code, you’re in for a real treat. Design patterns permeate the code, lending an elegant power to concepts like plug-ins and adapters. All of this is backed up by tens of thousands of unit tests. It’s a superb example of state of the art object oriented design, and Erich played a big part in its creation.
        
        Even with this kind of evidence I prefer to add a caveat here, I’m not entirely sure it’d be fair to say that Eclipse was written in TDD “start to finish”. It had a history spanning several previous incarnations before becoming what it is today, and I wasn’t involved closely enough to know how much of it was written in TDD. Large (application-sized) chunks of it apparently were.
        Morendil Oct 4, 2010, 11:05 AM
        3 points
        Parent
        
        So the test says “calling this code must result in this exact sequence of API calls”?
        
        That’s one way. It’s also possible to draw to an offscreen canvas and pixel-diff expected and actual images. Or if you’re targeting SVG you can compare the output XML to an expected value. Which method of expressing the postcondition you use is largely irrelevant.
        
        The salient point is that you’re eventually going to end up with a bunch of isolated tests each of which address a single concern, whereas your main vector drawing code, is, of necessity, a cohesive assemblage of sub-computations which is expected to handle a bunch of different cases.
        
        You only need to change the test if one such behavior itself changes in a substantial way: that’s more or less the same kind of thing you deal with if you document your code. (Test cases can make for good documentation, so some people value tests as a substitute for documentation which has the added bonus of detecting defects.)
        
        Without tests, what tends to happen is that a change or a tweak to fix an issue affecting one of these cases may very well have side-effects that break one or more of the other cases. This happens often enough that many coding shops have a formal or informal rule of “if the code works, don’t touch it” (aka “code freeze”).
        
        If your test suite detects one such side-effect, that would otherwise have gone undetected, the corresponding test will have more than paid for its upkeep. The cost to fix a defect you have just introduced is typically a few minutes; the cost to fix the same defects a few days, weeks or months later can be orders of magnitude bigger, rising fast with the magnitude of the delay.
        
        Those are benefits of having comprehensive unit tests; the (claimed) added benefit of TDD is that it tends to ensure the unit tests you get are the right ones.
        
        Again, this whole domain could and should be studied empirically, not treated as a matter of individual developers’ sovereign opinions. This thread serves as good evidence that empirical study requires first dispelling some misconceptions about the claims being investigated, such as the opinion you had going in that TDD requires first writing a huge test harness.
        cousin_it Oct 4, 2010, 11:44 AM
        2 points
        Parent
        Wha? I’m not even sure if you read my comment before replying! To restate: the only reason you ever modify the method of drawing a line segment is to change the sequence of emitted API calls (or output XML, or something). Therefore a unit test for that method that nails down the sequence is useless. Or is it me who’s missing your point?
        
        The cost to fix a defect you have just introduced is typically a few minutes; the cost to fix the same defects a few days, weeks or months later can be orders of magnitude bigger, rising fast with the magnitude of the delay.
        
        For the record, I don’t buy that either. I can fix a bug in our webapp in a minute after it’s found, and have done that many times. Why do you believe the cost rises, anyway? Maybe you’re living in a different “world” after all? :-)
        
        Thanks for the links about Eclipse, they don’t seem to prove your original point but they’re still interesting.
        thomblake Oct 5, 2010, 3:10 PM
        4 points
        Parent
        
        I can fix a bug in our webapp in a minute after it’s found
        
        It’s still relevant that “a minute after it’s found” might be months after it’s introduced, possibly after thousands of customers have silently turned away from your software.
        Morendil Sep 4, 2012, 9:07 AM
        3 points
        Parent
        
        For the record, I don’t buy that either. I can fix a bug in our webapp in a minute after it’s found, and have done that many times. Why do you believe the cost rises, anyway?
        
        For the record, cousin_it was entirely right to be wary of the rising-cost-of-defects claim. I believed it was well supported by evidence, but I’ve since changed my mind.
        Morendil Oct 4, 2010, 12:20 PM
        2 points
        Parent
        
        Or is it me who’s missing your point?
        
        You want behaviour to be nailed down. If you have to go back and change the test when you change the behaviour, that’s a good sign: your tests are pinning down what matters.
        
        What you don’t want is to change the test for a behaviour X when you are making code changes to an unrelated behaviour Y, or when you are making implementation changes which leave behaviour unaltered.
        
        If you’re special-casing IE9 so that your roads should render as one pixel thicker under some circumstances, say, then your original test will remain unchanged: its job is to ensure that for non-IE9 browsers you still render roads the same.
        
        Why do you believe the cost rises, anyway?
        
        It’s one of the few widely-agreed-on facts in software development. See Gilb, McConnell, Capers Jones.
        
        The mechanisms aren’t too hard to see: when you’ve just coded up a subtle defect, the context of your thinking (the set of assumptions you were making) is still in a local cache, you can easily access it again, see where you went off the rails.
        
        When you find a defect later, it’s usually “WTF was I thinking here”, and you must spend time reconstructing that context. Plus, by that time, it’s often the case that further code changes have been piled on top of the original defect.
        
        they don’t seem to prove your original point
        
        I wasn’t the one with a point originally. You made some assertions in a comment to the OP, and I asked you for a clarification of your argument. You turned out to have, not an argument, but some misconceptions.
        
        I’m happy to have cleared those up, and I’m now tapping out. Given the potential of this topic to induce affective death spirals, it’s best to let others step onto the mat now, if they still think this worth arguing.
        cousin_it Oct 4, 2010, 12:50 PM
        2 points
        Parent
        Well, this frustrates me, but I know the frustration is caused by a bug in my brain. I respect your decision to tap out. Thanks! Guess I’ll reread the discussion tomorrow and PM you if unresolved questions remain.
        wnoise Oct 4, 2010, 4:00 PM
        2 points
        Parent
        
        If I could express a postcondition in terms of what actually happens on the screen, that would be helpful, but I can’t.
        
        Why not? There are automated tools to take snapshots of the screen, or window contents.
        cousin_it Oct 4, 2010, 4:54 PM
        2 points
        Parent
        No. Just no. I’d guess that different minor versions of Firefox can give different screenshots of the same antialiased line. And that’s not counting all these other browsers.
      - Richard_Kennaway Oct 4, 2010, 1:23 PM
        3 points
        Parent
        
        Ron Jeffries (well-known Extreme Programming guru) tries to write a Sudoku solver using TDD which results in a slow motion trainwreck: 1, 2, 3, 4, 5. Compare with Peter Norvig’s attempt, become enlightened.
        
        The main difference I see between those is that Norvig knew how to solve Sudoku problems before he started writing a program, while Jeffries didn’t, and started writing code without any clear idea of what it was supposed to do. In fact, he carries on in that mode throughout the entire sorry story. No amount of doing everything else right is going to overcome that basic error. I also think Jeffries writes at unnecessarily great length, both in English and in code.
        cousin_it Oct 4, 2010, 1:51 PM
        7 points
        Parent
        The problem is, Extreme Programming is promoted as the approach to use when you don’t know what your final result will be like. “Embrace Change!” As far as I understand, Jeffries was not being stupid in that series of posts. He could have looked up the right algorithms at any time, like you or me. He was just trying to make an honest showcase for his own methodology which says you’re supposed to be comfortable not knowing exactly where you’re going. It was an experiment worth trying, and if it worked it would’ve helped convince me that TDD is widely useful. Like that famous presentation where Ocaml’s type system catches an infinite loop.
        randallsquared Oct 6, 2010, 4:09 AM
        1 point
        Parent
        
        The main difference I see between those is that Norvig knew how to solve Sudoku problems before he started writing a program, while Jeffries didn’t
        
        When you already know exactly how to do something, you’ve already written the program. After that, you’re transliterating the program. The real difficulty in any coding is figuring out how to solve the problem. In some cases, it’s appropriate to start writing code as a part of the process of learning how to solve the problem, and in those cases, writing tests first is not going to be especially useful, since you’re not sure exactly what the output should be, but it certainly is going to slow down the programming quite a lot.
        
        So, I’ll agree that Jeffries should have understood the problem space before writing many tests, but not that understanding the problem space is entirely a pre-coding activity.
    - wnoise Oct 3, 2010, 8:09 PM
      4 points
      Parent
      Thank you for the quite clear specification of what you mean by TDD.
      
      Personally, I love unit tests, and think having lots of them is wonderful. But this methodology is an excessive use of them. It’s quite common to both write overly complex code and to write overly general code when that generalization will never be needed. I understand why this method pushes against that, and approve. Never the less, dogmatically writing the “wrongest possible” code is generally a huge waste of time. Going through this sort of process can help one learn to see what’s vital and not, but once the lesson has been internalized, following the practice is sub-optimal.
      
      Every bug that is found should have a unit test written. There are common blind-spots that programmers make[*], and this catches any repeats. Often these are ones that even following your version of TDD (or any) won’t catch. You still need to think of the bad cases to catch them, either via tests, or just writing correct code in the first place.
      
      [*]: Boundary conditions are a common case, where code works just fine for everywhere in the middle of an expected range, but fails at the ends.
  - Vladimir_Nesov Oct 3, 2010, 1:36 PM
    1 point
    Parent
    I agree with this argument, but note that you could write some tests as instructions for human testers. If it’s the style of development that’s the more important output of TDD for you and not regression tests, you could run those human tests yourself, and discard them afterwards. The discipline is still useful.
    - Morendil Oct 3, 2010, 4:18 PM
      2 points
      Parent
      You’re both losing me, I’m afraid. I wasn’t parsing cousin_it’s argument as saying “it’s the style of development that’s the more important output of TDD”. How do you get that?
      
      I’ll agree that a style of development can be useful in and of itself. The style of development that tends to be useful is one where complex behaviour is composed out of smaller and simpler elements of behaviour, in such a way that it is easy to ascertain the correctness not only of the components but also of the whole.
      
      So we have one question here amenable to empirical inquiry: does the approach known as TDD in fact lead to the style of development outlined above?
      
      But it also seems to me that a suite of tests is a useful thing to have in many circumstances. The empirical question here is, does having a comprehensive test suite in fact yield a practical benefit, if so on which dimensions, and at what cost?
      
      If the tests are of the kind that can be automated, then I see little benefit in having human testers manually follow the same instructions. The outcome is the same—possibly detecting a new defect—but the cost is a million times the cost of computer execution. The main cost of automation is the cost of thinking about the tests, not the cost of typing them in so a computer can run them. So ex hypothesi that cost is incurred anyway.