Wednesday, September 20, 2017

Archivematica Implementation: A Retrospective

First, some exciting news: It's official! We've fully implemented Archivematica here at the University of Michigan Bentley Historical Library and, as of August 31, 2017, we've used it as part of the end-to-end digital archives workflow we developed during the Mellon-funded ArchivesSpace-Archivematica-DSpace Workflow Integration project to deposit our first "production" AIPs into DeepBlue, our repository for digital preservation and access.

Go ahead, check them out! (And these too!) And there was much rejoicing (yaaaaaaaay)...

In this post, we'll reflect a bit on what's happened since our last status report and look forward to a brave new digital archives world (at least here at the Bentley).

Major Milestones


Archivematica is a web- and standards-based, open-source application which allows your institution to preserve long-term access to trustworthy, authentic and reliable digital content.

The Mellon grant officially concluded nearly a year ago on October 31st, 2016. At the time, we announced that we had achieved each of the three major development objectives for the project:
  • the creation of a new Appraisal and Arrangement tab in Archivematica that will permit archivists to characterize, review, arrange, and describe digital archives;
  • the integration of Archivematica and ArchivesSpace; and
  • the integration of Archivematica and DSpace.

Archivematica 1.6 was officially released on March 16, 2017. Dubbed "the Nancy Deromedi release" in memory of Nancy Deromedi, former Associate Director for Curation here at the Bentley, whose vision helped shape defining features of the release, this release contained the features listed above whose development we sponsored as part of this work, as well as some work by MLibrary's own Aaron Elkiss to "drastically cut down" the number of files that need to be indexed by removing empty BulkExtractor logs. (Up until this point, indexing in Archivematica had been a huge problem for us, particularly for transfers with lots of files).

Even with this release, however, we still weren't quite ready to fully adopt Archivematica and go "live" with the ArchivesSpace-Archivematica-DSpace workflow (even though we had been making extensive use of Archivematica's Backlog feature and the `automation-tools`).

Fix One Bug, Two More Shall Take Its Place

Even before Archivematica 1.6 was officially released, however, we had identified a number of additional bug fixes (and new features) that were blocking our full adoption and implementation of Archivematica.

Issues Addressed by Artefactual

We opened another contract with Artefactual (the lead developers of Archivematica) to address a number of these issues, some of which are listed below:
  • Handles were not being written back to the File Version field of ArchivesSpace's Digital Object module. Ultimately, this meant that links out to digital content were not making it back to our public finding aids.
  • We were unable to drag-and-drop all files from the Backlog pane. This was essential to being able to associate digital content with its description.
  • It was difficult to identify the location of files in the Backlog pane when they had been singled out in the Examine Contents and File List panes. Archivists thus had a hard time locating files (e.g., after they had tagged them, say, as having sensitive data) in their original order.
  • Files whose formats were not able to be identified were being included in facets for other file formats in the Analysis pane, making file format characterization a bit unwieldy.
  • Required (at least for us) metadata fields were not being written to the DSpace Item (although they were being written to the METS file inside the AIP). This had implications for searching and browsing in DeepBlue. This is particularly problematic for ensuring that online researchers we get from search engines that take people directly to digital content in DeepBlue (rather than through our finding aids) have the context they need to understand that digital content.
  • Scrolling down the File List pane made all the File List buttons disappear, which led to poor usability of the functionality enabled by the buttons (e.g., creating a new component of description, finalizing an arrangement, etc.).
  • We wanted to the option to package AIPs in the .ZIP archive format (in addition to .7Z). We prefer the .ZIP format because it's more familiar than .7Z to the majority of our researchers.
  • The date facets in the File List pane were not functional and, in any case, last modified dates weren't showing up.

All of these issues (except the last one, but more on that later) were incorporated in the 1.6.1 release of Archivematica, which came out on August 1, 2017. This release also included some work by our own Dallas Pillen to fix a bug that occurred when trying to run a SIP through Archivematica's Ingest microservices when that SIP (coming from the ArchivesSpace pane in the Appraisal tab) had a date, but no title. (This is a fairly common practice in our description, permitted in ArchivesSpace as well as content standards like DACS.)

Issues Addressed Locally

Due to the local, idiosyncratic nature of some of some additional issues we identified, we also made a number of fixes to our forks of Archivematica and the Archivematica Storage Service:
  • Archivematica
    • We got rid of a nested "digital_object_component_" in the AIP directory structure, a relic of a time before we decided to simplify the way we model digital objects in ArchivesSpace. Now all digital content is packaged inside a single "objects" folder and hopefully this makes things a bit more straightforward for researchers.
    • We added a "http://hdl.handle.net/" prefix to the Handle written back to the File Version field of ArchivesSpace's Digital Object module so that links to digital content in the finding aids actually work. We toyed with hard-coding this in Archivematica, but Dallas ended up creating an ArchivesSpace plug-in that verifies all URLs with Handles coming to ArchivesSpace (whether or not they're coming from Archivematica).
    • We increased one of the timeouts in storage_service.py to an hour (it was set to two minutes) so that the Archivematica Storage Service could move around larger packages (e.g., at initial transfer, at final deposit, etc.) without timing out.
    • We disabled BulkExtractor scanners except the ones we need to identify the most common forms of sensitive data we encounter, since this application is extremely time and resource intensive. At the time, this application was not configurable in the Format Policy Registry.
    • We updated the default Copyright statement going from Archivematica to DSpace to point researchers to access and use restrictions recorded at the collection-level.
  • Archivematica Storage Service
    • We added a feature to deposit a License Bundle with every AIP going to DSpace. This is one of our internal requirements for all deposits to DSpace.
Of course, if you have questions about any of these, please don't hesitate to get in contact with us!

On Deck for Archivematica 1.7

Looking ahead to Archivematica 1.7, you can expect a couple of additional features related to the ArchivesSpace-Archivematica-DSpace Workflow Integration project, most notably the inclusion of an additional feature that will permit archivists to characterize and review content based on its last modified date.

The new "Last modified" column in the File List pane of the Appraisal tab.

While last-modified dates and times are notoriously unreliable (especially as they change hands or operating systems, e.g., on their way from donor to archive), they can help to give an archivist additional context for a set of files or prepare them for additional preservation steps that might be required for older content, e.g., exploring additional file format migration pathways if the content is of sufficient value.

This release will also contain some work I did to fix a bug that was introduced when the .ZIP functionality was added. The bug occurred when Archivematica tried to update permissions on the "metadata" bitstream when the AIP was packaged using the .ZIP archive file format.

Mission Accomplished (for These Archivists who are at This Institution on Their Mission)


So here we are--we've reached another milestone. As I mentioned at the beginning of the post, as of August 31, 2017, we are officially live with Archivematica and the new features and workflow we developed during the Mellon-funded ArchivesSpace-Archivematica-DSpace Workflow Integration project. In fact, our latest cohort of Project Archivists just started at the beginning of September and they were all trained to use these new tools and workflows--it's all so exciting!

While it's important to say that we've accomplished something--and that we're proud of what we've accomplished!--it's also important to qualify that a bit. What we've got works for us (we think!), at least for now, at least for most of what we're working with. We hope you can take at least some of what we've done (and we tried hard to make sure you could) and make it work for you, too. It's been exciting, for example, to hear about other people's experiences with the Appraisal tab (like this post on "Appraising Appraisal and picking the right tool for the job" by Chris Grygiel).

This has been an amazing journey, and along the way we've learned a lot, not just about Archivematica, but also about software development, project management, working with open source tools and communities, etc. We've said before that the end is just a new beginning--and that remains true today. With that in mind, we know our mission is never "accomplished" as such--we fully expect (and are equally excited for!) all the new challenges and adventures we'll face in Archivematica Land as we move forward.

Until next time!

2 comments:

  1. Thanks for the update! Great to hear that your new workflow is now in production. Congratulations and thanks for sharing!

    ReplyDelete