Tuesday, August 9, 2016

Archivematica Users Group @ SAA

Greetings, all! The Bentley's Mellon grant team had a busy and exciting time last week in Atlanta during the annual meeting of the Society of American Archivists (SAA).  One of the highlights was Dallas's and my demonstration of current functionality in our ArchivesSpace-Archivematica-DSpace Workflow Integration project during the Archivematica Users Group meeting (hosted by the ever-gracious Dan Gillean) on Wednesday, August 3.
We've given a lot of demos over the past year, at conferences as well as to individual institutions and groups (including the Digital Preservation Coalition), but this presentation really stood out for us.  While we always get a lot of great questions from folks, several individuals suggested new and exciting functionality that could be added to the Appraisal Tab in future releases.

Thinking Bigger (and Better!)

First, Seth Shaw, Assistant Professor of Archival Studies at Clayton State University (and developer of the Data Accessioner), pointed out that a tree map visualization would be a helpful addition to the 'Analysis Pane' in the Appraisal Tab.

As it now exists, the Analysis Pane includes a tabular report on file format distributions in a given transfer as well as pie charts that depict this range by (a) number of files per format and (b) total volume of the respective formats:

Analyze this!

A tree map would give archivists an alternative means of visualizing the relative size of directories and files and give insight to where given file types are located. This information could be very helpful in terms of comprehending directory structure, understanding the nature of content in a transfer, and identifying content that might require additional resources during ingest (such as large video files). 

It's also important to note that not all tree maps are created equal, as different instantiations have different affordances.  For instance, TreeSize Professional yields a visualization that includes labels of directories and file formats and uses color coding to show the relative depth of materials in the folder structure of a transfer, but doesn't represent individual files:

Whose size? TreeSize!

WinDirStat, on the other hand, color codes individual file formats, represents individual files in the tree map, and highlights directories or file format types based upon the user's selection from its directory tree or file format list:

The Colors, Children!

Next, Susan Malsbury, Digital Archivist at NYPL, asked about the potential of including Brunnehilde in the Analysis Pane.  For those of you who are not in the know (which very recently included me!), "Brunnehilde" is (to quote its developer, Digital Archivist Tim Walsh)
a Python-based reporting tool for born-digital files that builds on Richard Lehane's Siegfried. Brunnhilde runs Siegfried against a specified directory, loads the results into a sqlite3 database, and queries the database to generate CSV reports to aid in triage, arrangement, and description of digital archives. Reports include:
  • Sorted file format list with count
  • Sorted file format and version list with count
  • Sorted mimetype list with count
  • All files with Siegfried errors
  • All files with Siegfried warnings
  • All unidentified files
  • All duplicates (based on a Siegfried-generated md5 hash)
Walsh's tool could provide much more granular information about the contents of a transfer and when combined with visualizations it would offer additional and highly interesting ways to review and appraise digital archives.

Malsbury also introduced a question of how the Appraisal Tab's new functionality could accommodate disk images.  While Archivematica does have some support for transfers comprised of disk images, our use cases for the grant project did not specifically address this content type.  As Gillean noted, this question begs for additional cross-platform workflow integration.  Since BitCurator is designed to handle disk images, it makes sense for members of the open source digital archives community to explore how it can work in conjunction with Archivematica rather than replicate its functionality in the latter platform.  (A sidebar conversation Max Eckard and I had with Sam Meister from Educopia and the BitCurator Consortium confirmed that this is an important area of inquiry...)

Next Steps...

We're in the final stretch of our grant project and—sad as it is to say—have come to realize that all the awesome ideas we've had for the Appraisal Tab aren't going to make it into the final product.  We will, however, have achieved all the major goals and deliverables that we established at the outset:
  • Introduce functionality into Archivematica that will permit users to review, appraise, deaccession, and arrange content in a new "Appraisal and Arrangement" tab in the system dashboard.
  • Load (and create) ASpace archival object records in the Archivematica "Appraisal and Arrangement" tab and then drag and drop content onto the appropriate archival objects to define Submission Information Packages (SIPs) that will in turn be described as 'digital objects' in ASpace and deposited as discrete 'items' in DSpace.   
  • Create new archival object and digital object records in ASpace and associate the latter with DSpace handles to provide URIs/'href' values for <dao> elements in exported EADs.
All the same, we're thrilled by the realization that the Appraisal Tab as it will exist in the upcoming version 1.6 of Archivematica is really just a beginning.  By developing the Appraisal Tab and introducing basic appraisal functionality (file format characterization, sensitive data review, file preview, etc.), we've dramatically lowered the bar for other institutions that want to integrate new tools or introduce new features.  (And yes, I did borrow liberally from Dan Gillean for that last thought!)  

We're really excited to see where other institutions and developers take the Appraisal Tab because—I, for one, would love to see textual analysis and named entity recognition tools like those in ePADD (or the other projects identified by Josh Schneider and Peter Chen in this great post from the SAA Electronic Records Section blog).  

What features or functionality would you like to see in the Appraisal Tab?  What questions do you have about our current processes? Please reach out to us via the comments section or email.

Thanks for reading; live long and innovate!