Tuesday, August 25, 2015

Thorny Issue Number 2: Importing Legacy Accession Records Into ArchivesSpace

First off, all of us here are at the Bentley are still reeling and recovering from an exciting week at ARCHIVES 2015 in lovely Cleveland, OH. In case you missed it, a number of us involved with the Mellon grant and/or the A-Team participated:
  • Courtney Mumma (Archivematica), Brad Westbrook (ArchivesSpace), Mike, Dallas and I demonstrated a prototype of the Appraisal and Arrangement tab and showed off the newest enhancements to the Archivematica and ArchivesSpace integration workflows to gain public feedback. "Whee!"
  • Courtney and Mike (as well as a number of other folks) discussed lessons learned through the planning, development, testing and production of digital preservation applications.
  • Maureen Callahan (Yale), Regine Heberlein (Princeton) and Dallas presented case studies of how archivists (none of whom are IT professionals) learned and applied powerful metadata clean-up tools and strategies.
  • Templeton "Faceman" Peck, err, Olga, discussed legal and ethical challenges related to records of the Dr. Kevorkian-assisted suicides.
Good times! But enough reminiscing! Time to get back to work!

Planning for Importing Legacy Accession Records Into ArchivesSpace

In a previous post that outlined our strategy for implementing ArchivesSpace here at the Bentley Historical Library, I mentioned that Dallas and I have two major responsibilities as members of the A-Team: 1) importing legacy EADs; and 2) importing legacy accession records.

Note: There's actually a third thorny issue, importing MARC records for archival collections that don't have finding aids. However, for our own sanity, we've decided to kick that particular can down the proverbial road, at least until after April 1, 2016 when our grant is over.

As a reminder, this is the A-Team at the Bentley Historical Library. That's right, I'm B. A. Baracus.

I'm happy to report that we're starting to finish up our work on the first thorny issue. In fact, I'm proud to say that all 2,847 of our legacy EADs can be imported into ArchivesSpace, and that all of our cleanup work we've been doing lately has just been "icing on the cake." (For ample evidence that we really don't know when to say no, see this recent pull request, and this one as well, as well as the still ongoing saga.) Today's post explores the second thorny issue: importing legacy accession records into ArchivesSpace.


As a reminder, we currently keep track of accession data in a homegrown FileMaker Pro database called BEAL, a [b]ac[k]ronym for the Bentley Electronic Accessioning and Locating System.


I've said before that BEAL has been described as the "lifeblood" of the Bentley. After some reflection, I think I'd actually be a little more precise and say that the information inside of it, especially the tables with data on donors and accessions, is the real lifeblood of the Bentley. BEAL is more like the circulatory system--without blood, there's nothing to circulate! The central importance of this information to nearly everything else that the Bentley does (not to mention the central importance of this information to key archival concepts like provenance) certainly weighs heavily on our minds we begin our transfusion migration to ArchivesSpace.

The Good News

First, the good news. We've done some initial exploring of the problem (shout-out to Jessica Venlet, a former intern at the Bentley now doing a fellowship at MIT!) and the data itself appears to be less complicated than the [meta]data contained in our EADs. While there's more of it (19,312 records), it's flatter (no c0x levels to worry about) and somewhat more consistent and predictable (due to the fact that less hands in general and, to be honest, less inexperienced hands, have historically created and edited accession records). It's also relatively easy to get our hands on a copy of the authoritative version (OK, given the fact that we have some very old accession records that only exist in paper, maybe more like an-indefinite-article-but-almost-a-definite-article authoritative version) of accession records (a simple CSV export from our FileMaker Pro database). This is something that, given our convoluted way of creating finding aids, you simply can't say about our EADs.

In short, getting accession information from BEAL to ArchivesSpace won't be quite as simple as just mapping and crosswalking it, but almost.

The other bit of good news is that we also now have a blueprint for how to get information into ArchivesSpace because of all the work we've been doing with our EADs. Just like we've done with EADs, we plan to import agents first via the ArchivesSpace API (since these are the building blocks accessions in ArchivesSpace and of the events applied to them), and then the accessions themselves. That, coupled with the fact that we now have some good experience cleaning up and manipulating data programmatically, means we don't have to start from scratch!

The Bad News

It's not all rainbows and unicorns, however. We will still have issues with importing legacy accession records into ArchivesSpace, and unfortunately many of these are more complicated than dealing with technical challenges or messy data.

First, while there aren't many people that create accession records, there are many people who use them. What's more, many people use them in many different ways. Just take a look at this chart of current BEAL functionality (these are only those tables that relate to accessions in some way) that Mike recently put together:


Create accession records for newly acquired physical/digital collections

Document purchase of books

Documentation of restrictions/rights issues based on gift agreement and donor communications

Information on gift agreement (status, unique features, etc.)

Documentation of separations

Track processing status

Record locations of unprocessed materials

Collect information for DART donor reports

Contact details (with address and phone)

Records of additional names, affiliations

Mailing information

Status (for mailing list, as donor or 'friend', deceased or defunct)

Generate mail labels

List of accessions from a given donor

List of collections from a given donor

Donor information (may be pulled from contacts?)
Location Guide

Locations of content
Collection Record

Collection: Digital Deposits

List of digital deposits for a given collection
Collection : Digital Deposits - Items

Size of completed deposit (and separations)

Dates of deposit in Deep Blue and dark archives

Location of fully processed content in Deep Blue and Dark Archives

Information on access/use restrictions (including open dates)

Links to manifest/log files for deposit

Processing status

Description of entire deposit

That's a lot!

Before we do anything in ArchivesSpace, we'll need to decide which of these functions (not just which data) that we, as an institution, will continue to do in ArchivesSpace (and from there, which currently can be done and which will need some work before we can do them), and which we won't (and where and how, and even sometimes if we'll continue to do them). This will involve talking to other people on staff here about their their day-to-day work, and potentially making political, not just technical, decisions. As you might imagine, the political decisions are harder.

Second, while we don't anticipate many mapping issues, we do anticipate some. There simply isn't a 1:1 ratio between fields in BEAL and fields in ArchivesSpace, and even when there are, data entry conventions (and even data models!) in each system can be different.

Typically things are more granular and complex in ArchivesSpace. Locations and events in ArchivesSpace are good examples of this. In ArchivesSpace, we'll be able to track and manage locations as separate entities attached to containers attached to instances attached to intellectual entities in resources. Right now we just have plain text box numbers and locations in accession and resource records. With regard to events (and, by the way, let me be the first to say that wrapping your head around events in ArchivesSpace is almost as hard as wrapping your head around events in philosophy), many "objects in time or instantiations of properties in objects" (like acknowledging the receipt of an accession, processing a collection, etc.) that in BEAL exist as simple check boxes in ArchivesSpace become full-fledged events with timestamps and descriptions and even separate, associated agents.

I'll also say that all this is all a good thing! We'll be able to do much more with our data than we ever have. I'm particularly eager to start playing around with live reports. It will just take a bit of thinking to get from here to there and to do this right. Exacerbating this challenge is the fact that there isn't an archival standard for recording accession information like there is for recording descriptive data (good ole DACS!), so we don't have something to point to for settling disputes when issues like these come up.

Finally, there's a privacy issue. Some of the data that we keep about our donors and accessions is sensitive. One of the best things we've done for our work here to prepare legacy EADs for import to ArchivesSpace is to introduce Git and GitHub into our archival workflows for distributed version control so that lots of us can work on the same set of data without stepping on each others toes. Because of the privacy issue, however, we won't be able to use GitHub in the same way. We've discussed using a private repository for working with legacy accession data, but we're somewhat uncomfortable with this idea. All that is to say that we're open to suggestions about version control systems or methods that will enable many hands to work on the same set of sensitive data in an efficient way!

The Ugly News

Well, that's the good and bad news. It turns out there's some ugly news as well. We still have dirty data--locations done the "old way" and the "new way," names that appear to refer to the same person in two different tables, one with one middle initial, and one with another, etc. Maybe dirty data is just a fact of life in the library and archives domain!


New Accession record in ArchivesSpace

So now you know how we'll be spending our fall semester! Be sure to stay tuned for more on our adventures in accessions.

Have you imported your legacy accession data into ArchivesSpace? How is the way you manage accessions in ArchivesSpace different from the way you managed them before? Do you also think events are confusing (even if necessary) in ArchivesSpace and/or philosophy? Let us know!

Tuesday, August 18, 2015

Appraisal and Arrangement Tab Live Demo at SAA2015

Psst.... In case you haven't heard, the Bentley Historical Library is teaming up with Artefactual Systems and LYRASIS for a brown bag lunch session at SAA2015 on Thursday, August 20, 2015 from 12:15pm - 1:30pm in room 25C of the Cleveland Convention Center.

In addition to hearing about the broader goals of ArchivesSpace-Archivematica integration, Courtney Mumma will discuss development work that builds off of current Archivists' Toolkit-Archivematica integration for the Rockefeller Archive Center and Max, Dallas, and I will demo the new Appraisal and Arrangement tab:

Find more information on the demo (and avenues for you to provide feedback) on the Archivematica wiki:  https://wiki.archivematica.org/SAA_2015_Demonstration_and_Feedback.

If you're feeling really adventurous, you can download and install the Appraisal Tab prototype using Artefactual Labs' github (see above link for installation instructions).

We really want this development work to be flexible enough to meet our needs as well as yours, so please come on out to the brown bag to learn more about the project and share your thoughts.

See you in the CLE!

Thursday, August 13, 2015

Advocacy and Born-Digital Archives

Earlier this week, I helped our development officer draft a proposal to seek additional funding for our digital curation program.  Given the potential audience of library/university administrators and external financial donors, I quickly realized that I had to adjust how I typically represent our work with digital archives.

SIPs and DIPs, checksums and disk images—in short, all the fun things we talk about in our listservs, blogs, and conferences were probably going to be meaningless to these folks.  Instead, I needed to capture what we do on a daily basis and explain why it matters to a fairly diverse group who wouldn't know the OAIS reference model from a hole in the ground.

Have you seen my functional entities?
Then again, why should they have to know about the gory details of OAIS?  I'm of the mind that key stakeholders (our administrators, funders, donors, researchers, etc.) don't necessarily need to know about the minutiae of digital preservation or the alphabet soup of acronyms regularly featured in conference presentations and procedural manuals (unless they're interested, of course...).

What they really need to know is what we do (at a high level) and why it's important.  If we're unable to accomplish this objective—and I would argue that every successful digital archives/preservation program does so—then we face an uphill battle for resources and, ultimately, relevance.  And that may only be a wee bit hyperbolic.  Advocacy is an integral component of the archival enterprise and we—archives of every size and stripe—are together in this quest to demonstrate our value and seek support.

What is it that you do, anyway?

As archivists, we already face an uphill battle when it comes to explaining our jobs.  Throw in the complexities of digital archives and there's little wonder that your spouse/parent/friend's eyes may start to glaze over when you describe an average day on the job.  (Or maybe that's only happened to me?)  One of our major challenges, then, is to be able to represent our work in a way that others can relate to and understand.  

At the same time, advocacy isn't just about education; simply raising awareness about archives and our value is a function of outreach.  Advocacy, on the other hand, goes the extra step in seeking to influence the actions and decisions of our interlocutors. 

Making the Case

As with any communication, it's important to tailor the message for the audience and the specific issue at hand.  The message you deliver to donors of materials may therefore be very different from what you prepare for administrators.

So let's say you've identified the stakeholder(s) you want to reach; while it's important to convey a sense of what you do, save the workflow matrices, code snippets, and UML diagrams for your colleagues.  Successful advocacy should involve and inspire the audience so that they understand how and why they might benefit from our work.  

This in turn requires us to communicate the "added value" that archivists bring to the preservation and curation of digital archives.  This value includes such things as (and I'm preaching to the choir, here):

  • Organizing and describing materials so that researchers understand the nature of our collections
  • Taking steps to ensure that content can still be accessed and used far into the foreseeable future
  • Protecting sensitive personal information and deploying appropriate restrictions based on rights, institutional policies, legal requirements, and donor agreements.
  • Improving the means by which people search for and retrieve content
  • Developing or providing resources to help researchers use (or reuse) materials in important and meaningful ways.  
If we want people to care about our work—and to demonstrate that interest by committing resources or collections to our institution—we need to be unequivocal about the benefits we bring to the table (and the more precise or quantitative, the better!).

To help convince the stakeholders of your case, it's also important to highlight any innovations, achievements, or recognition related to the topic at hand.  Doing so will establish your/your institution's legitimacy and credentials and points the way to continued or future success.  I don't have any hard data to back this up, but it's my strong conviction that current and/or potential stakeholders are more willing to donate their support and/or resources when presented with a proven track record and the opportunity to continue/advance that work.  Modesty is no longer a virtue!
And now I'll take the plunge and some of that document I mentioned above:
Over the past two decades, our collective historical record has undergone a sea change: the web has revolutionized publishing and any number of businesses, email and tweets have replaced personal letters, word processing files and spreadsheets now comprise organizational records, and the convenience and ubiquity of smart phones enable us all to amass large collections of digital photographs and video.  
As our professional and personal lives increasingly move online and into other digital spaces, the Bentley Historical Library has emerged as a proven leader in the quest to preserve and make accessible essential born-digital materials.  
Today, researchers can access the electronic records of former Governor Jennifer Granholm, review the source code for the influential Michigan Terminal System time-sharing operating system from 1968, and study the creative digital output of noted artists such as Peter Sparling, Vince Castagnacci, and Arnold Weinstein. 
Our work in “digital curation” encompasses traditional archival functions—the process of selecting materials of high research or intrinsic value and making them accessible to researchers—but also involves additional steps to ensure the integrity and authenticity of content.  The Bentley furthermore seeks to add value to materials through the production of detailed description, access portals, and tools that help patrons find answers to their most pressing research questions.  
OK—having said all that, I want to acknowledge that I am far from being an expert on this topic!  I would be delighted to hear about important points that I've missed or to see examples of successful advocacy for digital archives.  If you've got anything to share, leave a comment!