Reasons to Stay Wrong
I've had a few test-heavy projects, and am on another. I don't criticize the value of having lots and lots of tests. In all of these projects there is a common complaint:
"That needs to be changed, but changing it will break too many tests."
This disturbs me. I don't have an answer to it. The tests are there to ensure agility, but we're stopped from refactoring and correcting code because of the sheer volume of tests we have available to us. Moreover, if you change the tests you don't really know if you are free from error regression. You could also break a test so that it passes wrongly (such as adjusting the expected results to the actual).
This is an unwanted application of the StableDependencyPrinciple[?] – the tests make the code stable by depending on it, creating afferent couplings on the production code at every point possible. It's good, and it's bad.
I don't know how real the problem is (yet). Are we afraid of the wrong thing? When we make a change that breaks the tests, is that a good thing? Is it instructive? Is it really that much work? Is it really that uncertain?
Is it a good excuse to not fix the things we know are wrong?
!commentForm
It's funny that people don't complain about the compiler. Compilation is another set of tests that we run, and when we make changes we often do break those tests temporarily.
I think that when we hear "that needs to be changed, but changing it will break too many tests" we have to compare and contrast it with: "that needs to be changed, but changing it will break the software."
It's a false choice. People look at the cost of changing the tests, but I don't think they are looking at the cost of changing the software without tests. They see the immediate problem, but forget that the solved problem was worse. That said, it's still a problem. It might be inevitable. Once we give our code a backbone, well, it has one.
I think that when we hear "that needs to be changed, but changing it will break too many tests" we have to compare and contrast it with: "that needs to be changed, but changing it will break the software."
It's a false choice. People look at the cost of changing the tests, but I don't think they are looking at the cost of changing the software without tests. They see the immediate problem, but forget that the solved problem was worse. That said, it's still a problem. It might be inevitable. Once we give our code a backbone, well, it has one.
Disclaimer: I'm on the above-mentioned project with Tim, so I have insider info.
IMAO, part of the problem is that even with TDD, test code itself still gets treated as a second-class citizen. We allow things there that we wouldn't allow in "production" code - e.g., creeping complexity and pathological coupling (see Dependency Injection is Only Mostly Good). The same standards and smell tests that we use in implementing "real" code should apply to our test coding.
Another issue: are our unit tests really unit tests? Given mocking tools like Rhino Mocks, we should be able to write unit tests in which the only "real" code is the specific class under test, and any necessary context is represented by mocks. Unfortunately, it has often been easier (especially in the absence of a good mocking framework) to appropriate large chunks of the real application code to set up the test. This alone could account for a lot of the fragility Tim is talking about.
A third problem we have is that Fitnesse test pages are not integrated into our IDE (Visual Studio with ReSharper[?]), so that the impact of a refactoring or a functionality change on ATs is not instantly apparent: it requires an external manual search made harder by the loose binding of fixture and method names.
IMAO, part of the problem is that even with TDD, test code itself still gets treated as a second-class citizen. We allow things there that we wouldn't allow in "production" code - e.g., creeping complexity and pathological coupling (see Dependency Injection is Only Mostly Good). The same standards and smell tests that we use in implementing "real" code should apply to our test coding.
Another issue: are our unit tests really unit tests? Given mocking tools like Rhino Mocks, we should be able to write unit tests in which the only "real" code is the specific class under test, and any necessary context is represented by mocks. Unfortunately, it has often been easier (especially in the absence of a good mocking framework) to appropriate large chunks of the real application code to set up the test. This alone could account for a lot of the fragility Tim is talking about.
A third problem we have is that Fitnesse test pages are not integrated into our IDE (Visual Studio with ReSharper[?]), so that the impact of a refactoring or a functionality change on ATs is not instantly apparent: it requires an external manual search made harder by the loose binding of fixture and method names.
In the last case, the distance to the fitnesse is a low-grade violation of Locality of Reference Documentation. I don't know how "wrong" that violation is, but it is inconvenient sometimes. It possibly needs to be, since it's supposed to be closer to the users than the developers, and in a language more closely approximating theirs than ours. But still, it's an annoyance.
It's funny, I do consider tests to be second-class citizens with regard to a couple things like visibility constraints, but I agree that in most other respects, they should be first-class code. It's hard to know where to draw the line.
Things are getting better, but I think that many teams treat their tests as "dark matter." They know that their tests have mass and inertia, and that's about it.
Things are getting better, but I think that many teams treat their tests as "dark matter." They know that their tests have mass and inertia, and that's about it.
But Fitnesse pages aren't just documentation - they're what Appleton (on the LORD page Tim linked) calls "approach 3": the code is in the document. The test tables really are executable (/interpretable) code - that's what makes Fit such a cool implementation of Knuth's concept. The only thing that's missing is integration into the IDE.
People look at the cost of changing the tests, but I don't think they are looking at the cost of changing the software without tests. They see the immediate problem, but forget that the solved problem was worse.
Not the case here. With full appreciation for the existence of the tests, almost a sacred devotion, they still look at the costs of changing the tests, and despair. There is never a preference to being without them. Or never one expressed, at least.
We're not talking about mis-assessment by noobs here. We're talking about a very natural (and SDP-predicted) problem that will arise in every application at some point. If you have a load of tests, and those test are focused on code that must eventually change (ie is in some way wrong, or becomes in some way wrong) then you will have tests that try to force you to stay wrong.
That said, it's still a problem. It might be inevitable.
Once we give our code a backbone, well, it has one.
Exactly. This is the natural phenomenon that I'm interested in: the system can be now be overtly obstinate.
I blogged about this some time ago, here: http://xcskiwinn.org/community/blogs/panmanphil/archive/2005/08/27/4356.aspx
The salient point is that the team feels less agile after having spent considerable effort learning how to write testable UI code. Since that post some time has passed, I can also say that the tests didn't spare us much bug hunting. The bugs mostly happened at the integration level. While I'm sure we could have put more energy into testing every nook and cranny of the edge cases, that would accomplish just what your post says: our test suite would be the lead weight that keeps us from changing what should be the most flexible part of the code, the UI. I had a conversation with Brian Marick about this and he wondered if we could be smarter about what to write and especially what not to write. I sure hope so. He followed that conversation with his own set of UI testing refactorings in his blog.
The salient point is that the team feels less agile after having spent considerable effort learning how to write testable UI code. Since that post some time has passed, I can also say that the tests didn't spare us much bug hunting. The bugs mostly happened at the integration level. While I'm sure we could have put more energy into testing every nook and cranny of the edge cases, that would accomplish just what your post says: our test suite would be the lead weight that keeps us from changing what should be the most flexible part of the code, the UI. I had a conversation with Brian Marick about this and he wondered if we could be smarter about what to write and especially what not to write. I sure hope so. He followed that conversation with his own set of UI testing refactorings in his blog.
Sorry, I just don't buy it that the tests are 'forcing us to stay wrong'. There's always a cost to changing software. With a comprehensive set of tests, we pay almost all of the cost up front by changing the software and changing the tests. Without the tests, we pay a bigger cost in most cases, and most of it is paid in regular installments of production defects for a long time after the change.
I am only asking, not teaching here. I wonder if we reach a point of equilibrium where we spend as much time fighting the old tests as producing the new code.
Even if this is so, I'm not ready to give up testing. I think it's too important, but it looks like a mass v. agility thing here. I can be wrong. After all, I'm
not yet test-driven.
Even if this is so, I'm not ready to give up testing. I think it's too important, but it looks like a mass v. agility thing here. I can be wrong. After all, I'm
not yet test-driven.
> Once we give our code a backbone, well, it has one.
That's really great!
I don't have a 'simple catch all solution' for this.
I'm no politician but here are some hopefully inspirational thoughts:
1. If you have to change so many tests, there seems to traces of overresponibility.
Maybe refactoring in small steps and only breaking fiew tests might help to prepare the change.
2. Maybe there's lot's of code dulication inside the tests. Try to refactor the tests and see if
they get slimmer and easier to change.
3. I don't use neither Fitnesse nor mock frameworks. I code acceptance, regression and performance
tests using xUnit too. It might be more work but I have all the language and tool powers
(automated refactoring and the like) so I find it easier to make changes.
That's really great!
I don't have a 'simple catch all solution' for this.
I'm no politician but here are some hopefully inspirational thoughts:
1. If you have to change so many tests, there seems to traces of overresponibility.
Maybe refactoring in small steps and only breaking fiew tests might help to prepare the change.
2. Maybe there's lot's of code dulication inside the tests. Try to refactor the tests and see if
they get slimmer and easier to change.
3. I don't use neither Fitnesse nor mock frameworks. I code acceptance, regression and performance
tests using xUnit too. It might be more work but I have all the language and tool powers
(automated refactoring and the like) so I find it easier to make changes.
Tom said: IMAO, part of the problem is that even with TDD, test code itself still gets treated as a second-class citizen. |
This attitude needs to be corrected immediately! Tests are not second class citizens, and should be maintained to the same level of quality as production code.
I have a client that made it their stated policy to make tests second class. They freely broke all the rules of good quality code in their tests. Now they can't maintain their tests. The problem that Tim is descriping in the original blog entry is magnified 10X by the sloppiness of their tests. Their only choice has been to throw away the tests that can't be maintain. In the end, you wind up with no tests!!!!!! That's not an option for a professional team.
Tim said: We're talking about a very natural (and SDP-predicted) problem that will arise in every application at some point. |
I agree that tests can violate the SDP. Tests are code after all. However, there is a solution for code that violates the SDP. Design the code so that it does not violate the SDP.
This gets back to the second-class citizen argument. If tests are truly first class citizens, then they must be imbued with first class design. In other words, if the tests violate the SDP, then we have to use the same design solutions we would use in production code to invert the dependencies and design a solution to keep the tests easy to change.
I have to be careful with "violation" v. "consequence" here. There's no violation of SDP. The tests are fluid and are dependent on implementation, making the implementation stable through increased afferent couplings. The violation is really ADP - we're depending on implementations in our tests. Of course, that's exactly what tests are for, isn't it? To make sure that the concrete bits are written correctly?
Mind you, I'm being very careful to say that we don't want to stop testing or throw away regression tests. But I am actively questioning whether the tests that make us agile in the start don't actually strip agility from us as the project continues. Ultimately, the original question will remain - what can we deduce from a system where the quantity and detail level of the tests make people afraid to change code? Is there possibly a new set of patterns or principles hiding in this rather dark shadow?
Mind you, I'm being very careful to say that we don't want to stop testing or throw away regression tests. But I am actively questioning whether the tests that make us agile in the start don't actually strip agility from us as the project continues. Ultimately, the original question will remain - what can we deduce from a system where the quantity and detail level of the tests make people afraid to change code? Is there possibly a new set of patterns or principles hiding in this rather dark shadow?
I've seen symptoms like you describe in a system with extensive mock-style tests. If the class-under-test had an implementation that called class methods x.a, y.b and z.c, then the unit test would set up mock x, y and z classes and assert that methods a, b and c were called. This was a very fragile testing approach because any change to the implementation would break the tests, even if the resulting system state was still the same. This experience drove me to use a more state-based approach to testing, which I find much less fragile. (I'll defer to those expert in mock testing to explain what we did wrong!)
If you look at tests as your main product and to code as just an implementation of the intent expressed by tests, then we start looking at tests with greater respect and attention.
Tests deserve refactoring. They should be refined and honed so that intent is expressed exactly in one point. Well, ideally..
Tests should be read out as a story, as an explanation, as examples.
But they shouldn't be boring saying the same things over and over with little variations, in the hope of catching inconsistencies by brute force.
Reading through test should be like reading through an explanation of he system, showing you how to build up layers of abstraction and sometimes going deep in a concept, exploring all its facets.
-- Chiaroscuro @ Liquid Development
Tests deserve refactoring. They should be refined and honed so that intent is expressed exactly in one point. Well, ideally..
Tests should be read out as a story, as an explanation, as examples.
But they shouldn't be boring saying the same things over and over with little variations, in the hope of catching inconsistencies by brute force.
Reading through test should be like reading through an explanation of he system, showing you how to build up layers of abstraction and sometimes going deep in a concept, exploring all its facets.
-- Chiaroscuro @ Liquid Development
I'm afraid that in the previous post I got lost :-)
what I wanted to say is that if things are said only in one point, changes to the tests should be somewhat limited and predictable.
what I wanted to say is that if things are said only in one point, changes to the tests should be somewhat limited and predictable.
Uncle Bob said : I'll defer to those expert in mock testing to explain what we did wrong!
Well I'm no expert in mock testing but my experience shows that mock based tests which are too fragile usually point to a problem with the design. If a class has 3 collaborators (x,y,z)
it might be a case of SRP violation, or even a case of the Middle Man code smell.
Usually these problems are solved by introducing a new level of abstraction by replacing some colaborators with a single more abstract colaborator (lets call it w) that represents a higher level implementation detail. This way if the lower level details need to change ( and therefore also their tests ) there would be a lot less tests that need to change ( only those that check the interaction of w with the colaborators that it replaced).
The real experts in mock testing have a lot more to say about it here :
http://www.mockobjects.com/MockObjectTestingPatterns.html
Well I'm no expert in mock testing but my experience shows that mock based tests which are too fragile usually point to a problem with the design. If a class has 3 collaborators (x,y,z)
it might be a case of SRP violation, or even a case of the Middle Man code smell.
Usually these problems are solved by introducing a new level of abstraction by replacing some colaborators with a single more abstract colaborator (lets call it w) that represents a higher level implementation detail. This way if the lower level details need to change ( and therefore also their tests ) there would be a lot less tests that need to change ( only those that check the interaction of w with the colaborators that it replaced).
The real experts in mock testing have a lot more to say about it here :
http://www.mockobjects.com/MockObjectTestingPatterns.html
Sagy brings up an interesting point. Most of these problems (I'm intimately familiar w/ the code base we're talking about here) relate to presenters in an MVP implementation. They are Middle Men by definition, but perhaps we've let that notion bully us into letting them run wild with multiple responsibilities.
Tests should cohere to the tested code, and decouple from each other. So the inability to change code is a design smell in the tests. Change it and beat the result flat.
People who grew up in test-free situations might not recognize potential refactors. For example, if That is broken but the tests keep it broken, then simply start writing ThatEx[?]. Deprecation Refactor will allow you to groom the tests as you migrate from the old That to the new one. And you always have deliverable code.
People who grew up in test-free situations might not recognize potential refactors. For example, if That is broken but the tests keep it broken, then simply start writing ThatEx[?]. Deprecation Refactor will allow you to groom the tests as you migrate from the old That to the new one. And you always have deliverable code.
I’m pretty new to TDD, so please forgive me if I’ve missed the point entirely... But, aren’t you thinking about this wrong? It sounds as if you are worried that changing implementation will break multitudes of tests in an uncontrolled manner. Shouldn’t you be changing your tests first towards your new target, breaking them yourself in a controlled manner, and then changing the implementation to make them pass? Or maybe rather than change the existing tests at all, you should write new ones, make them pass, and then review and remove the older tests that now break. After all, a test you cannot make pass without doing the wrong thing has lost its value, has it not?
You understand my point entirely.
However, systematically fixing these tests to fail correctly takes a lot of time.
Teams (agile or not) don't like taking a lot of time for anything.
Mass/inertia/whatever.
However, systematically fixing these tests to fail correctly takes a lot of time.
Teams (agile or not) don't like taking a lot of time for anything.
Mass/inertia/whatever.
At a previous company, I worked on a (re)implementation of the Java Messaging Service interfaces. (That is, we were working on a product that implemented JMS.) Unfortunately, I began by sketching out a model of the internals, and doing TDD on all of those classes ... which had to be changed when I realized they didn't encompass some necessary functionality.
In this job, I patched up a number of unit tests that had been left to rot ... but I ended up chucking a bunch, or at least taking them out of the CI build (once we had one).
So the lessons I learned are:
1. Choose your tests wisely: the *really* important ones are those that exercise publically accessible or published interfaces. The others can be chucked out if they're wrong. (How this principle applies to GUIs I don't know; I tend to believe you should test a complete object model underneath the GUI, and refactor the GUI so that it's simply issuing trivial commands and queries to that model ... but I haven't done GUIs in a while, so I can't attest to how practical that is.)
1.1. As a corrolary, true TDD starts "outside" with the customer-level functionality to be implemented. The class and object diagrams are useful exercises, but start with the tests, not the model.
2. As a previous poster said, if tests make heavy use of mock objects, the design has too many dependencies. At the very least, come up with a generic event interface rather than a N^2 set of specialized interfaces.
3. If a test breaks too often, or yields too many false positives (or negatives), chuck the test and probably the current design, and start over. If starting from scratch is too daunting, a useful exercise might be to chuck the tests, treat the code as "legacy code", and then write new tests to explore and fix the component.
4. If a legacy test stands in the way of refactoring an internal component, just chuck it. If it's mostly useless, incorporate the non-useless parts in integration tests, or tests of the public objects, and chuck the rest.
Tests are made for developers, not developers for tests.
In this job, I patched up a number of unit tests that had been left to rot ... but I ended up chucking a bunch, or at least taking them out of the CI build (once we had one).
So the lessons I learned are:
1. Choose your tests wisely: the *really* important ones are those that exercise publically accessible or published interfaces. The others can be chucked out if they're wrong. (How this principle applies to GUIs I don't know; I tend to believe you should test a complete object model underneath the GUI, and refactor the GUI so that it's simply issuing trivial commands and queries to that model ... but I haven't done GUIs in a while, so I can't attest to how practical that is.)
1.1. As a corrolary, true TDD starts "outside" with the customer-level functionality to be implemented. The class and object diagrams are useful exercises, but start with the tests, not the model.
2. As a previous poster said, if tests make heavy use of mock objects, the design has too many dependencies. At the very least, come up with a generic event interface rather than a N^2 set of specialized interfaces.
3. If a test breaks too often, or yields too many false positives (or negatives), chuck the test and probably the current design, and start over. If starting from scratch is too daunting, a useful exercise might be to chuck the tests, treat the code as "legacy code", and then write new tests to explore and fix the component.
4. If a legacy test stands in the way of refactoring an internal component, just chuck it. If it's mostly useless, incorporate the non-useless parts in integration tests, or tests of the public objects, and chuck the rest.
Tests are made for developers, not developers for tests.
Thanks, Frank. It's good to see discussion of "Choose your tests wisely". I think that your lessons learned will be valuable to many people. I find little guidance about which tests to write, which NOT to write, and which to delete. Thanks for thoughtful treatment.
Too often I ask for insight, and get only advocacy. I'm not anti-TDD by any means, I just want to get past all the horse manure and find the horse. ;-)
Too often I ask for insight, and get only advocacy. I'm not anti-TDD by any means, I just want to get past all the horse manure and find the horse. ;-)
Add Child Page to ReasonsToStayWrong