Tests are a Language
Do we need a trellis?
When writing FitNesse[?] tests there is any number of ways that an author can write the same conditions:
get strings with ids | |
id | value |
1 | test value |
check that string | 1 | has value | test value | ||
check string | 1 | has value | test value | ||
check | string 1 | is | test value | is | true |
ensure string | 1 | is | test value | ||
verify | test value | in string | 1 | is true | |
verify | test value | in | string 1 | ||
string | 1 | should be | test value |
The choice of syntax is pretty arbitrary, isn't it? What should matter is that the test specifies the desired behavior. Testing is very much an organic task, and it is a goal of FitNesse[?] to give our tests an organic feel they should be readable by organic processors (people).
When a human being has to perform work in any textual system with too much surface area, and/or too much unorganized complexity, he find a coping mechanism. The human mind being devoted mostly to pattern-matching, an obvious solution is to locate a text of similar expression and copy it. Once copied, the human may analyze and modify the text. Often this is not merely a cut-past-modify operation, but one with multiple "cuts" and "merges" on the way to a satisfactory expression.
This is a bit of a problem also because the tester has to know some large set of existing test scripts in order to find the proper analogy (if one exists). Why is our tester having to search through bodies of test? To what degree is it avoidable?
When we find this cut-n-paste practice in coding, we recognize the coping mechanism as a code smell or a design smell (or perhaps a deodorant smell, like a comment). Should we understand that this smell in our test language is likewise a sign of trouble?
Perhaps our organic lexicon is a bit too complex and convoluted.
In a well-considered system, there should be at least one obvious way to perform any routine task. In an ideal system I feel that there should be only one obvious way.
There is work being done by some colleagues and acquaintances of mine to make unit testing more expressive and oganic-looking, by defining a workable, simple lexicon. See rspec for example. Maybe we need a similar effort for our test languages.
I suppose that there is a wrong way to solve this problem, namely doing a full analysis model and determining all of
the fixture nouns, verbs, and adverbs in advance. This might be policed by the local jack-booted test review thugs, with
offenders being subject to rework and possibly public humiliation. However, I think it would is arrogant to assume
that you can know all of the ways that people may eventually want to phrase the tests.
Instead, maybe we need some ground rules that can be easily understood and can be used to guide the language growth
like a trellis guides the growth of a vine in a garden. No coercion, no enforcement should be necessary. If the
team follows the same rules, then they should be always be very close to being right.
Suggestion for a Trellis
In a blog, you are allowed to speculate. Don't take this as a well-developed professional opinion. Instead, look at it as an invitation to a conversation. In the open source world, I understand that one way to kick-start a project is to get some code working that illustrates the idea, and hope that people get interested and participate. That's all I'm doing here.
That's enough disclaimer and intent, let's do something:
In the interest of productivity, we should consider a few metagrammatical rules (ignoring underlying fixture varieties
entirely):
- Fewer cells is better, all else being equal. In the extreme:
check that string is hello world |
check that string is truly hello world |
check that string1 has value hello world |
compare string to hello world expecting an exact match |
It's a good starting point, but hardly sufficient. How can we tell where the data starts and ends?
- But a cell that holds variable data (constant values or references to data in the system) should do nothing else:
check | string | is | hello world | |
check that | string | is | hello world | true |
check that | string1 | has value | hello world | |
compare | string | to | hello world | expecting an exact match |
It appears that column fixtures and row fixtures already follow the rule. It's always nice to be backward-compatible when making new rules. But what of the "exact match"? It's really not a data element, but it's not a command really. It's really more of an adverb kind of thing. We need the same rule for keywords, apparently.
- A cell that contains an operation selector provides only an operation selector
check | string | is | hello world | ||
check that | string1 | has value | hello world | ||
check that | string | is | hello world | true | |
compare | string | to | hello world | expecting an | exact match |
That's better, but not really what we want yet it's still saying the same thing three different ways. Some are more precise than others. In particular, "ensure" shows more intent than "check" and reads better (is more obvious). Also, what does "has" mean? That it has only that value, or that it contains the substring "hello world"? Maybe we need another rule.
- Eliminate less precise terms in favor of more precise terms.
ensure | string | is | hello world | |
ensure | string | contains | hello | |
ensure | string | is | hello world | true |
The line ending in 'true' is really not like a sentence, is it? It seems that this is the wrong form. Maybe booleans need to be expressed more naturally, inline.
- Express operations inline
ensure | string | is | hello world |
ensure | string | is not | hello world |
ensure | string | is precisely | hello world |
Now that last line is redundant. Do we ever expect anything that is not an exact match? If not, then this is redundant. Actually,
it's redundant even if we do. We can already expect an exact match with the 'is' expression.
- Implement each expression Once And Only Once (OAOO)
Finally, we have to keep watch against various dialects growing inconsistently around different domain entities.
ensure | name | is | Tim |
check that | birth year | is | 1962 |
see if birth state | is | IN | |
compare | citizenship | USA |
This violates our OAOO for the lexicon, even though it won't necessarily look that way in code. In code, these are all unique, singularly-implemented functions, but the lexicon has clear duplication of concepts. How do we compare two things, grammatically? With "is". Do we check or ensure? We ensure. You may find other violations or duplications here.
Remember that I'm talking about a testing discipline here, and testing should be done by the customer (role), and not the programmer. I think that it is completely unreasonable (and philosophically opposed to the whole goal of Fit/FitNesse[?]) to expect the tester to look at the source code to determine what fixtures and fixture keywords are available. But I also think that it's unreasonable for developers to have to write every possible variation of the test language. I'm looking for this kind of a compromise to be equally as tedious for both. ;-)
Conclusions
There needs to be some system so that our testing language is both organic and managed. Hopefully, this will lead to a system where coping mechanisms like cut-and-paste are unnecessary (or less necessary). I don't know that I have the right rules for such a system. Mine are largely untested, are not free from personal aesthetics, and the only correctness guarantee I can give is "they seem right to me" which is to say "none at all". But it seems like there should be some rules.
Join in with me and see if we can't organically, in an open way, create a very small and very reasonable set of rules for growing a test language(s).
!commentForm
I like the first option (one cell) so much I've got a card to add it to Python FIT. I've had it ever since I looked at Inform 7.
RSpec isn't bad. In fact, I think it's brilliant. I've written a version in Python that will be showing up in the Python FIT 0.9 unit tests. However, that's a unit test harness, not an acceptance test harness.
Graham Nelson just raised the bar for using English as a DSL. I think it's worth looking at it before going off and writing the kind of badly crippled versions of "pseudo english" we're all too familiar with.
http://www.inform-fiction.org/
John Roth
RSpec isn't bad. In fact, I think it's brilliant. I've written a version in Python that will be showing up in the Python FIT 0.9 unit tests. However, that's a unit test harness, not an acceptance test harness.
Graham Nelson just raised the bar for using English as a DSL. I think it's worth looking at it before going off and writing the kind of badly crippled versions of "pseudo english" we're all too familiar with.
http://www.inform-fiction.org/
John Roth
John, please send me an email address (to tottinger at object mentor dot com) so I can contact you about this python thing you're doing.
Add Child Page to TestsAreLanguage