Friday, May 29, 2015

Test-driving your code

In most established archival institutions, any given finding aid can represent decades of changing descriptive practice, all of which are reflected in the EAD files we generate from them. This diverse array of standards and local-practice is what makes our job as data-wranglers interesting, but it also means that with any programmatic manipulation we make, there is always a long tail of edge-cases and outliers that we need to account for, or risk making unintentional and uncaught changes in places we aren't expecting.

When I first came on to the A-Space / Archivematica integration project, this prospect was terrifying - that an unaccounted-for side-effect in my code could stealthily change something unintended, and fall under the radar until it was too late to revert, or, worse, never be caught. After a few days of an almost paralytic fear, I decided to try a writing style known by many in the agile software-development world as Test-Driven Development, or TDD.

After the first day I had fallen in love. Using this methodology I have confidence that the code I am writing does exactly what I want it to, regardless of the task's complexity. Equally valuable, once these tests are written a third party can pick up the code I've written and know right away that any new functionality they are writing isn't breaking what is already there. One could even think of it as a kind of fixity check for code functionality - with the proper tests I can pick up the code years down the line and know immediately that everything is still as it should be.

In this post I will be sharing what TDD is, and how it can be practically used in an archival context. In the spirit of showing, not telling, I'll be giving a walkthrough of what this looks like in practice by building a hypothetical extent-statement parser.

The code detailed in this post is still in progress and has yet to be vetted, so the end result here is not production-ready, but I hope exposing the process in this way is helpful to any others who might be thinking about utilizing tests in their own archival coding.

To start off, some common questions:

What is a test?

A test is code you write to check that another piece of code you have written is doing what you expect it to be doing.

If I had some function called normalize_date that turned a date written by a human, say "Jan. 21, 1991" into a machine-readable format, like "1991-01-21", its test might look something like this:

This would fail if the normalized version did not match expected outcome, leaving a helpful error message as to what went wrong and where.

So what is TDD?

Test-Driven Development is a methodology and philosophy for writing code first popularized and still very commonly used in the world of agile software design. At its most basic level it can be distilled into a three-step cyclic process: 1) write a failing test, 2) write the simplest code you can to make the test pass, and 3) refactor. Where one might naturally be inclined to write code then test it, TDD reverses this process, putting the tests above all else.

Doesn't writing tests just slow you down? What about the overhead?

This is a common argument, but it turns out in many cases tests actually save time, especially in cases where long-term maintainability is important. Say I have just taken on a new position and have responsibility to maintain and update code built before my time. If my predecessors hadn't written any tests I would have to look at every piece of code in the system before I could be confident that any new changes I'm making aren't breaking any current obscure functionality. If there were tests, I could go straight into making new changes without the worry that I might be breaking important things that I had no way to know about.

Ensuring accuracy over obscure edge-cases is incredibly important in an institution like the Bentley. The library's EADs represent over 80 years of effort and countless hours of work on the part of the staff and students who were involved in their creation. The last thing we want to do while automating our xml normalizations is make an unintended change that nullifies their work. Since uncertainty is always a factor when working with messy data, it is remarkably easy for small innocuous code changes to have unintended side-effects, and if one mistake can potentially negate hundreds of hours of work, then the few hours it takes to write good tests is well worth the investment. From a long-term perspective, TDD saves time, money, and effort -- really there's no reason not to do it!

Learn by doing - building an extent parser in python with TDD

That's a lot of talk, but what does it look like in practice? As Max described in his most recent blog post, one of our current projects involves wrestling with verbose and varied extent statements, trying to coerce them into a format that ArchivesSpace can read properly. Since it's on our minds, let's see if we can use TDD to build a script for parsing a long combined extent statement into its component parts.

The remainder of this post will be pretty python heavy, but even if you're not familiar with programming languages, python is unusually readable, so follow along and I think you'll be surprised at how much it makes sense!

To begin, remember the TDD mantra: test first, code later. So, let's make a new file to hold all our test code (we'll call it and start with something simple:

now run it and...

Ta-da! We have written our first failing test.

So now what? Now we find the path of least resistance - the easiest way we can think of to solve the given error. The console suggests that a "split_extents" function doesn't exist, so let's make one! Over in a new file, let's write

Function created! Before we can test it, our test script needs to know where to find the split_extents function, so let's make sure the test script can find it by adding the following to

Now run the test again, and see where that leads us:

Our assert statement is failing, meaning that split_extent_text is not equal to our target output. This isn't surprising considering split_extents isn't actually returning anything yet. Let's fix the assert error as simply as we can:

There! It's cheesiest of fixes (the code doesn't actually do anything with the input string, it just cheekily returns the list we want), but it really is important to do these small, path-of-least-resistance edits, especially as we are just learning the concept of TDD. Small iterative steps keeps code manageable and easy to conceptualize as you build it -- it can be all too easy to get carried away and add a whole suite of functionality in one rushed clump, only to have the code fail at runtime and not have any idea where the problem lies.

So now we have a completely working test! Normally at this point we would take a step back to refactor what we have written, but there really isn't much there, and the code doesn't do anything remotely useful. We can easily break it again by adding another simple test case over in

This test fails, so we have code to write! Writing custom pre-built lists for each possible extent is a terrible plan, so let's write something actually useful:

Run the test, and... Success! Again, here we would refactor, but this code is still simple enough it isn't necessary. Now that we have two tests, we have a new problem: how do we keep track of which is which, or know which is failing when the console returns an error?

Luckily for us, python has a built-in module for testing that can take care of the background test management and let us focus on just writing the code. The one thing to note is that using the module requires putting the tests in a python class, which works slightly differently than the python functions you may be used to. All that you really have to know is that you will need to pre-append any variable you want to use throughout the class with "self.", and include "self" as a variable to any function you define inside the class. Here is what our tests look like using unittest as a framework:

You can run the tests just like you would any other python script. Let's try it and see what happens:

Neat! Now we have a test suite and a function that splits any sentence that has " and " in it. But many extent statements have more than two elements. These tend to be separated by commas, so let's write a test to see if it handles a longer extent statement properly. Over in's setUp function, we'll define two new variables:

Then we'll write the test:

Running the test now fails again, but now the error messages are much more verbose. Here is what we see now that we're using python's testing module:

As you can see, it tells us exactly which test fails, and clearly pinpoints the reason for the failure. Super useful! Now that we have a failing test, we have code to write.

Now the tests pass, but this code is super ugly - time to refactor! Let's go back through and see if we can clean things up a bit.

It turns out, we can reproduce the above functionality in just a few lines, using what are known as list comprehensions. They can be really powerful, but as they get increasingly complicated they have the drawback of looking, well, incomprehensible:

We may return to this later and see if there is a more readable way to do this clearly and concisely.

Now, as always, we run the tests and see if they still pass, and they do! Now that we have some basic functionality we need to sit down and seriously think about the variety and scope of extent statements found in our EADs, and what additional functionality we'll need to ensure our primary edge cases are covered. I have found it helpful at this point to just pull the text of all the tags we'll be manipulating and scan through them, looking for patterns and outliers.

Once we have done this, we need to write out a plan for each case that the code will need to account for. TDD developers will often write each planned functionality as individual comments in their test code, giving them a pre-built checklist they can iterate through one comment at a time. In our case, it might look something like this:

If we build out this functionality out one test at a time, we get something like the following:

The completed test suite:

And here is a more complete, refactored along the way to use regular expressions instead of solely list comprehensions:

That's it! We now have a useful script, confidence that it does only what it is supposed to, and a built-in method to ensure that its functionality remains static over time. I hope you've found this interesting, and I'd love to hear your thoughts on the pros and cons of implementing TDD methods in your own archival work - feel free to leave a message in the comments below!

No comments:

Post a Comment