Tuesday, August 25, 2015

Thorny Issue Number 2: Importing Legacy Accession Records Into ArchivesSpace

First off, all of us here are at the Bentley are still reeling and recovering from an exciting week at ARCHIVES 2015 in lovely Cleveland, OH. In case you missed it, a number of us involved with the Mellon grant and/or the A-Team participated:
  • Courtney Mumma (Archivematica), Brad Westbrook (ArchivesSpace), Mike, Dallas and I demonstrated a prototype of the Appraisal and Arrangement tab and showed off the newest enhancements to the Archivematica and ArchivesSpace integration workflows to gain public feedback. "Whee!"
  • Courtney and Mike (as well as a number of other folks) discussed lessons learned through the planning, development, testing and production of digital preservation applications.
  • Maureen Callahan (Yale), Regine Heberlein (Princeton) and Dallas presented case studies of how archivists (none of whom are IT professionals) learned and applied powerful metadata clean-up tools and strategies.
  • Templeton "Faceman" Peck, err, Olga, discussed legal and ethical challenges related to records of the Dr. Kevorkian-assisted suicides.
Good times! But enough reminiscing! Time to get back to work!

Planning for Importing Legacy Accession Records Into ArchivesSpace


In a previous post that outlined our strategy for implementing ArchivesSpace here at the Bentley Historical Library, I mentioned that Dallas and I have two major responsibilities as members of the A-Team: 1) importing legacy EADs; and 2) importing legacy accession records.

Note: There's actually a third thorny issue, importing MARC records for archival collections that don't have finding aids. However, for our own sanity, we've decided to kick that particular can down the proverbial road, at least until after April 1, 2016 when our grant is over.

As a reminder, this is the A-Team at the Bentley Historical Library. That's right, I'm B. A. Baracus.

I'm happy to report that we're starting to finish up our work on the first thorny issue. In fact, I'm proud to say that all 2,847 of our legacy EADs can be imported into ArchivesSpace, and that all of our cleanup work we've been doing lately has just been "icing on the cake." (For ample evidence that we really don't know when to say no, see this recent pull request, and this one as well, as well as the still ongoing saga.) Today's post explores the second thorny issue: importing legacy accession records into ArchivesSpace.

Context


As a reminder, we currently keep track of accession data in a homegrown FileMaker Pro database called BEAL, a [b]ac[k]ronym for the Bentley Electronic Accessioning and Locating System.

ACCESSIONS in BEAL

I've said before that BEAL has been described as the "lifeblood" of the Bentley. After some reflection, I think I'd actually be a little more precise and say that the information inside of it, especially the tables with data on donors and accessions, is the real lifeblood of the Bentley. BEAL is more like the circulatory system--without blood, there's nothing to circulate! The central importance of this information to nearly everything else that the Bentley does (not to mention the central importance of this information to key archival concepts like provenance) certainly weighs heavily on our minds we begin our transfusion migration to ArchivesSpace.

The Good News


First, the good news. We've done some initial exploring of the problem (shout-out to Jessica Venlet, a former intern at the Bentley now doing a fellowship at MIT!) and the data itself appears to be less complicated than the [meta]data contained in our EADs. While there's more of it (19,312 records), it's flatter (no c0x levels to worry about) and somewhat more consistent and predictable (due to the fact that less hands in general and, to be honest, less inexperienced hands, have historically created and edited accession records). It's also relatively easy to get our hands on a copy of the authoritative version (OK, given the fact that we have some very old accession records that only exist in paper, maybe more like an-indefinite-article-but-almost-a-definite-article authoritative version) of accession records (a simple CSV export from our FileMaker Pro database). This is something that, given our convoluted way of creating finding aids, you simply can't say about our EADs.

In short, getting accession information from BEAL to ArchivesSpace won't be quite as simple as just mapping and crosswalking it, but almost.

The other bit of good news is that we also now have a blueprint for how to get information into ArchivesSpace because of all the work we've been doing with our EADs. Just like we've done with EADs, we plan to import agents first via the ArchivesSpace API (since these are the building blocks accessions in ArchivesSpace and of the events applied to them), and then the accessions themselves. That, coupled with the fact that we now have some good experience cleaning up and manipulating data programmatically, means we don't have to start from scratch!

The Bad News


It's not all rainbows and unicorns, however. We will still have issues with importing legacy accession records into ArchivesSpace, and unfortunately many of these are more complicated than dealing with technical challenges or messy data.

First, while there aren't many people that create accession records, there are many people who use them. What's more, many people use them in many different ways. Just take a look at this chart of current BEAL functionality (these are only those tables that relate to accessions in some way) that Mike recently put together:


TableFunction
Accessions

Create accession records for newly acquired physical/digital collections

Document purchase of books

Documentation of restrictions/rights issues based on gift agreement and donor communications

Information on gift agreement (status, unique features, etc.)

Documentation of separations

Track processing status

Record locations of unprocessed materials

Collect information for DART donor reports
Contacts

Contact details (with address and phone)

Records of additional names, affiliations

Mailing information

Status (for mailing list, as donor or 'friend', deceased or defunct)

Generate mail labels
Donors

List of accessions from a given donor

List of collections from a given donor

Donor information (may be pulled from contacts?)
Location Guide

Locations of content
Collection Record


Collection: Digital Deposits

List of digital deposits for a given collection
Collection : Digital Deposits - Items

Size of completed deposit (and separations)

Dates of deposit in Deep Blue and dark archives

Location of fully processed content in Deep Blue and Dark Archives

Information on access/use restrictions (including open dates)

Links to manifest/log files for deposit

Processing status

Description of entire deposit


That's a lot!

Before we do anything in ArchivesSpace, we'll need to decide which of these functions (not just which data) that we, as an institution, will continue to do in ArchivesSpace (and from there, which currently can be done and which will need some work before we can do them), and which we won't (and where and how, and even sometimes if we'll continue to do them). This will involve talking to other people on staff here about their their day-to-day work, and potentially making political, not just technical, decisions. As you might imagine, the political decisions are harder.

Second, while we don't anticipate many mapping issues, we do anticipate some. There simply isn't a 1:1 ratio between fields in BEAL and fields in ArchivesSpace, and even when there are, data entry conventions (and even data models!) in each system can be different.

Typically things are more granular and complex in ArchivesSpace. Locations and events in ArchivesSpace are good examples of this. In ArchivesSpace, we'll be able to track and manage locations as separate entities attached to containers attached to instances attached to intellectual entities in resources. Right now we just have plain text box numbers and locations in accession and resource records. With regard to events (and, by the way, let me be the first to say that wrapping your head around events in ArchivesSpace is almost as hard as wrapping your head around events in philosophy), many "objects in time or instantiations of properties in objects" (like acknowledging the receipt of an accession, processing a collection, etc.) that in BEAL exist as simple check boxes in ArchivesSpace become full-fledged events with timestamps and descriptions and even separate, associated agents.

I'll also say that all this is all a good thing! We'll be able to do much more with our data than we ever have. I'm particularly eager to start playing around with live reports. It will just take a bit of thinking to get from here to there and to do this right. Exacerbating this challenge is the fact that there isn't an archival standard for recording accession information like there is for recording descriptive data (good ole DACS!), so we don't have something to point to for settling disputes when issues like these come up.

Finally, there's a privacy issue. Some of the data that we keep about our donors and accessions is sensitive. One of the best things we've done for our work here to prepare legacy EADs for import to ArchivesSpace is to introduce Git and GitHub into our archival workflows for distributed version control so that lots of us can work on the same set of data without stepping on each others toes. Because of the privacy issue, however, we won't be able to use GitHub in the same way. We've discussed using a private repository for working with legacy accession data, but we're somewhat uncomfortable with this idea. All that is to say that we're open to suggestions about version control systems or methods that will enable many hands to work on the same set of sensitive data in an efficient way!


The Ugly News


Well, that's the good and bad news. It turns out there's some ugly news as well. We still have dirty data--locations done the "old way" and the "new way," names that appear to refer to the same person in two different tables, one with one middle initial, and one with another, etc. Maybe dirty data is just a fact of life in the library and archives domain!

Conclusion

 
New Accession record in ArchivesSpace


So now you know how we'll be spending our fall semester! Be sure to stay tuned for more on our adventures in accessions.

Have you imported your legacy accession data into ArchivesSpace? How is the way you manage accessions in ArchivesSpace different from the way you managed them before? Do you also think events are confusing (even if necessary) in ArchivesSpace and/or philosophy? Let us know!

No comments:

Post a Comment