Monday, May 11, 2015

The Mythical Man-Month: What ArchivesSpace Isn't (and Why That's OK)

Newsflash: ArchivesSpace isn't perfect.

Neither is Archivematica (that will be a later post).

Neither is DSpace (still later).

We know that none of these systems will solve all of our problems, and we are well aware of the fact that even after the dust settles and we have an end-to-end, digital archiving workflow, integrating Archivematica, ArchivesSpace and DSpace, all of our problems won't be solved. Today's post is the first in a series that will discuss what each of these pieces of software bring to the table, what they don't (or at least not yet), and why that's fine with us.

The Mythical Man-Month

If you haven't heard of it, The Mythical Man-Month is the "Bible of Software Engineering," so-called by its author, Fred Brooks, because "everybody quotes it, some people read it, and few people go by it." Well, I try to be an honest archivist, so I must admit that I am guilty as charged.
Don't worry, this doesn't link to Amazon.

I have, however, read "No Silver Bullet: Essence and Accidents of Software Engineering," an essay written in 1986 that was included in the anniversary edition of The Mythical Man-Month. It begins like this:

Of all the monsters that fill the nightmares of our folklore, none terrify more than werewolves, because they transform unexpectedly from the familiar into horrors. For these, one seeks bullets of silver that can magically lay them to rest. 

Exciting, right? The "werewolves" and "silver bullets" he goes on to describe have to do with the "essential" difficulties of in software engineering (i.e., complexity, conformity, changeability and invisibility) and order of magnitude breakthroughs--perhaps something like an assembly line vs. minor improvements to that make that assembly line more efficient--that solve these difficulties, respectively. While some of the examples Brooks gives of "hopes for the silver" are a bit dated these days (or are they?), and while I have a hard time relating to it's emphasis on productivity--he was a manager, after all--much of his essay, especially the section on conceptual attacks on the "essential" difficulties, is as applicable today as it ever was.

A Tip of the Hat to Our Friends at ArchivesSpace

Software engineering isn't easy, and the ArchivesSpace folks deserve a tip of the hat for doing it as well and as transparently as they do. ArchivesSpace, like all pieces of software, suffers from all of the same "essential" difficulties that make software engineering the "werewolf" that it is:


ArchivesSpace is complex. Sure, they try to abstract some of this complexity away by talking about the "main" (i.e., Archival Objects, Accessions, Resources and Digital Objects) and "supporting" modules and by creating fancy data models, but the fact of the matter is that ArchivesSpace attempts to describe, manage and provide access to archives, manuscripts and digital objects. It takes 29 people and an army of graduate students to do that same thing here!


Not only is ArchivesSpace itself complex, it is also forced to conform to the "arbitrary" complexity imposed by its users, both archivists and researchers. This complexity is arbitrary, to use language from "No Silver Bullet," because it is produced "without rhyme or reason by the many institutions and systems to which [ArchivesSpace's] interfaces must confirm." 

Sure, as professionals, we have archival standards and best practices. We use well-defined metadata schemes and encoding standards like EAD, MARC and DACS. A good number of us (but not all of us...) are even coming to ArchivesSpace having used one of the two industry standards, Archivist's Toolkit or Archon. However, the truth of the matter is that all of us are very different. Our use cases for ArchivesSpace are different. Our institutions are different. Our technical infrastructure is different. The systems we have that need to interact with ArchivesSpace are myriad.

ArchivesSpace can't be all things to all people, or even all things to one person, and it's not fair for us to expect that.


ArchivesSpace now has over 200 members, 55 of whom joined even before the application was finalized. If nothing else, this indicates that ArchivesSpace is successful, and successful software gets changed. First, people find it useful, and want to extend it's functionality beyond it's original domain (e.g., find-and-replace functionality). Second, it has outlived the frameworks with which it was developed, the metadata schemes which it must understand, the systems with which is must interact, and, well, that's all I can think of right now. In fact, ArchivesSpace is due for a pretty big change, and members are voting on relevant user stories now.

In short, successful software changes, and as we all know, change can be difficult.


The last property that makes ArchivesSpace difficult is invisibility. Yes, it's open source (the code is out there just waiting for you--yes, you!--too look at it and even adapt for your own purposes), but Brooks isn't talking about this type of [in]visibility. For him, invisibility means that a piece of software cannot be visualized with geometric abstractions in the way that "land has maps, silicon chips have diagrams...[and] computers have connectivity schematics." He argues that you may be able to do this for various aspects of software, but not the software as a whole, "depriving the mind of some of its most powerful conceptual tools."

The larger point to be made (and I hope I'm not reading too much into this), especially as it relates to us lay-folk, is that it's not always clear exactly how a piece of software works. We know what it does to make our lives easier (and that's why user stories are based on job functions), but we often describe how it in language that conveys our ignorance: as "magic," or as existing "behind a curtain," "behind a veil," or "under a hood").

That can make it hard to communicate with the developers of any piece of software, including ArchivesSpace. I found this to be experientially true just the other day: I naively assumed that a change that I had in mind of ArchivesSpace would be (or could be) a simple fix, only to learn that it would require re-working the data model for subjects.

That's enough about werewolves. [1]

What ArchivesSpace Isn't (or, More Appropriately, Isn't Yet)

That was a rather long-winded way of introducing the real topic at hand: what ArchivesSpace isn't, or isn't yet, at least for us and in no particular order.

A Mechanism for Container Management

Up until recently, ArchivesSpace did not model containers, and it's not easy to declare facts about a container (like a barcode or its location), especially for many containers at once. This recently changed, however, and Hudson Molonglo has just released a new ArchivesSpace plugin to add a new container type to ArchivesSpace for Yale. This hasn't been added to the core ArchivesSpace code base, at least not yet. Depending on the outcome of the upcoming vote, this may also change soon. For more information, please see Maureen Callahan's Managing Content, Managing Containers, Managing Access post on Yale's ArchivesSpace@Yale blog.

Integrated with Aeon

As noted in a previous post on implementing ArchivesSpace, ArchivesSpace is currently incompatible with Aeon. This is extremely important to us, as Aeon has recently revolutionized the way that we process researcher requests.

We know, however, that both Lyrasis and Atlas Systems are interested in this, and many institutions (including ours) are considering potential integration scenarios. In fact, two representatives from the Bentley Historical Library will also be attending the upcoming Northeast Aeon Meeting at Yale University in June. It sounds like there will be talk of Aeon-ArchivesSpace compatibility there as well.

A Total Replacement for BEAL

The Bentley Electronic Accessioning and Locating System, or BEAL

BEAL, described in the same post, is the homegrown "lifeblood" of the Bentley Historical Library. BEAL keeps track of a wide variety of information on accessions, collections, purchases, digital objects, donors, friends, contacts, libraries, administrative requests, mailing lists, reports and even prospects. ArchivesSpace only does some of those things. One of the many challenges we foresee with implementing ArchivesSpace is figuring out a way to replicate all of this functionality in ArchivesSpace.

Integrated With Other Relevant University of Michigan Databases 

Closely related to the point above is the fact that ArchivesSpace in incompatible with other university systems that keep track of donor information. This is so unique to our situation, however, that we don't intend to write users stories or make formal feature requests for this functionality--it wouldn't make sense for us to request or ArchivesSpace to develop functionality that only works for one user. We are still going to have to figure out what to do, so keep an eye out for a future post with details about how we plan to handle this situation.

Robust Enough for Digital Object Management

ArchivesSpace does have a module for Digital Objects, which we will use to record some basic technical and rights information from Archivematica and the Handle from DSpace to point to digital object(s) online. However, we will not be managing digital objects in ArchivesSpace (and to be fair, ArchivesSpace was not designed for digital object management, as evidenced by the very minimal METS file that it outputs). For example, we don't plan to store them or provide online to them, make changes to them, or initiate preservation actions on them in or from ArchivesSpace. How we are actually going to end up doing any of this is a bit up in the air at this point, but we know it won't be in ArchivesSpace.

Connected with Other Search Applications

Resources (or any component thereof) are not indexed by search engines like Google, and ArchivesSpace is not currently integrated with library catalogs or compatible with library discovery layers, such as Summon. While former is actually more important to us that the latter (we don't use a discovery layer), both will be important for the profession at large.

In a previous position, I learned that right around 60% of traffic to our digital collections came through search engines, which makes a strong case for integration with search engines (and for a way to contextualize individual digital objects or components of finding aids, since this is where end users land). This lack of integration also has implications for more localized discovery through Merlyn, the library's catalog. For now, we'll keep doing what we're doing, creating MARC and importing them into the catalog, with the advantage that ArchivesSpace will export MARC and we won't have to create MARC manually.

A Public Access Portal

Actually, ArchivesSpace does have a public access portal which more and more institutions are using. However, in the medium-term, we plan to continue to use DLXS for public access to finding aids. Longer-term, we are contributing to (and very excited about) the Arclight project out of Stanford. Preliminary objectives for ArcLight include:

  • Discovery of physical and digital objects (e.g., finding aids described using EAD, full text search for digital archival materials, presentation and delivery of digital materials)
  • Compatibility with Hydra and ArchivesSpace
  • Developed, enhanced, and maintained by the Hydra/Blacklight community

And, Finally, Why All of This is OK

Brooks pessimistically asserts that "no technological breakthrough promises to give the soft of magical results," and that all we can expect is "the promise of steady, if unspectacular progress." However, he is more optimistic about what he calls "promising attacks on the conceptual essence." ArchivesSpace delivers in a number of ways predicted by Brooks in his essay:

Buy Versus Build

For the archivist, the most radical solution for constructing software is not to construct it at all. Whether you're "buying" ArchivesSpace by becoming a member, or "buying" it with staff time (i.e., open source software is more like a "free dog" than a "free beer"), in either case you aren't doing it yourself (building another silo from scratch) or going it alone.

Incremental Development (Grow, Not Build, Software)

ArchivesSpace doesn't make the mistake of assuming you can fully specify a product before coding. The relevant portion of "No Silver Bullet" is worth quoting in full:

The hardest single part of building a software system is deciding precisely what to build. No other part of the conceptual work is as difficult as establishing the detailed technical requirements, including all of the interfaces to people, to machines, and to other software systems. No other part of the work so cripples the resulting system if done wrong. No other part is more difficult to rectify later.

Instead, developers there follow the agile development methodology in order to better satisfy customers, welcome changing requirements and deliver working software frequently.

Great Design and Designers

ArchivesSpace builds on the shoulders of the aforementioned giants. Archivist's Toolkit and Archon served the community for many years. ArchivesSpace is certainly the best of both of those worlds, and the future only looks brighter for both ArchivesSpace as an archival management software package and for the community at large.


OK, one more. [2]

We love the fact that ArchivesSpace is open source and community-driven, and we try to participate as fully as we can to that community. We do that financially, obviously, but also by participating on the listservs and Google Groups, contributing user stories and feature requests and developing code and making it available to the public. You should too!

Perhaps the best thing that ArchivesSpace isn't, or isn't yet, is a finished product.  Perhaps that is the silver bullet.

[1] "GermanWoodcut1722". Licensed under Public Domain via Wikimedia Commons -
[2] "The Were-Wolf by Clemence Housman" by Unknown - Scan of original. Licensed under Public Domain via Wikimedia Commons -

No comments:

Post a Comment