Friday, September 16, 2016

On Square Pegs, Round Holes, PREMIS Rights Statements and Apollo 13

As mentioned before in a previous post on PREMIS and PREMIS Rights Statements, we've been exploring ways that we can create rights statements as we're processing SIPs in Archivematica and then use those rights statements to set access profiles for the AIPs in our DSpace repository.

At that point, our our thinking was mostly theoretical. Since then, we've had some time to think about it, to confer with our MLibrary colleagues as well as those at the Rockefeller Archive Center and even reflect on Ed Pinsent's comments on the last post (thanks, everyone!). In this post, I'd like to give an update on how we plan (yes, still just a plan--things could change!) to actually do it. Before I dive in, though, I should remind our readers that what we're proposing here is a bit like trying, as the expression goes, to fit a "square peg in a round hole." Here's a quote from the PREMIS Data Dictionary for Preservation Metadata:
PREMIS primarily defines characteristics of Rights and permissions concerned with preservation activities, not those associated with access and/or distribution.

Yikes. "Not those associated with access and/or distribution." hrm...

The Access Profiles

Let's start at the end. In DeepBlue, our DSpace repository, we have some amount of control over both a digital object--or, to use DSpace-speak, bitstream(s)--and its associated metadata--or item. We can associate each of them (independently of one another) with one of four (or actually as many as we care to create) of what are called groups. In practice, we apply a handful of common combinations of item and bitream(s) groups when we deposit AIPs in our DSpace repository:
  • Open: Both items and bitstreams are open to be viewed/downloaded by anyone in the whole world.
  • Bentley Reading Room users: While items can be viewed by anyone, downloading of bitstreams must be done from within the Bentley's IP range. This can be from a wired or wireless connection.
  • University of Michigan users: Items can be viewed by anyone, but only University of Michigan affiliates may download bitstreams.
  • Totally restricted/embargoed items: Nobody (except Bentley archivists, who may also fulfill reference requests) can view or download anything, item or bitstream(s). Typically, these types of things are embargoed until a particular date (based on local policies), at which point in time both item and bitstream(s) will become open.[1]
  • Audio/visual items with copyright or other types of concerns: Items can be viewed by anyone, but only Bentley archivists can download bitstreams. Heretofore this profile consists mostly of audio/visual material that is preserved in DSpace but made available for streaming (not downloading) in the Bentley Digital Media Library.

A Quick Refresher on Act[ion]s in PREMIS Rights Statements

As a reminder, PREMIS Rights Statements are made up of one basis (the raison d'ĂȘtre of the rights statement, something like copyright or policy) and one or more actions associated with that basis (these are very specific actions the repository is or isn't allowed to do). Since the basis won't have an impact on its associated action, I won't go into much detail about them here.

Actions come from a controlled vocabulary, made up of things like:
  • replicate: make an exact copy
  • migration: make a copy identical in content in a different file format
  • modify: make a version different in content
  • use: read without copying or modifying
  • disseminate: create a copy or version for use outside of the preservation repository
  • delete: remove from the repository

As you can see, these have a very "digital preservation" feel (in the most narrow sense of the word[2]). Hence the data dictionary's warning above.

Actions may be allowed in all cases or, of course, they may have restrictions. These express situations where, for instance, dissemination is permitted, but only to a specific type of person (say, one that's affiliated with your institution), or, taken to the extreme, that dissemination is not permitted, period. At least in Archivematica's case, you've got three choices to that express such restrictions: allow, disallow or conditional. This may sound like it covers a lot, but as you'll see, we had to get a little creative with this, as we end up using "conditional" to describe a number of different conditions.

Other than that, there are some begin and end dates associated with that action, and a note containing a textual description of the right granted if additional description is needed.

Mapping PREMIS Rights Statements in Archivematica to DSpace Groups

SIP rights template--second page



Now on to mapping the PREMIS Rights Statements implementation in Archivematica to the groups in DSpace. There's a couple different ways we might have approached this.

One way might have been to try to use an Act to tell the repository exactly what it was allowed to do with both the item and the bitstream for a particular AIP. While this approach gave us the granularity we'd need for machine-actionable PREMIS Rights Statements, we worried that it would be overly cumbersome for our human processors that would be, for the most part, manually adding them to SIPs and keying in the data.

Another way might have been to use a local controlled vocabulary for the Act field, something like "disseminate-bentley", "disseminate-umich", etc. However, associating a particular target with an action seemed, in the word of one of our DSpace gurus, "somewhat contrary to the spirit of the allowed actions" (see this sample controlled vocabulary for the 'act' element, some of which I listed above). You'll also notice if you click that link that "disseminate-bentley", "disseminate-umich", etc. are not on that list, and for good reason! We even thought briefly about using the Restriction field to specify the target audience before realizing that it too has a controlled vocabulary (one that's actually enforced by Archivematica and then used later on in some logic).

In the end, we settled on using the Note field to specify audience. Now, we know this isn't the most elegant solution--in general, the intention of the notes is specifically to not be machine-actionable, but we felt that since this PREMIS Rights Statement would ultimately be preserved in the AIP (in the METS!), and since there's a chance someone might run across it outside of our repository environment, that this was the way to go.

So here's our plan, at least an overview:

DeepBlue Groups

Archivematica PREMIS Rights Statements

Item

Bitstream

Act

Restriction

Restriction note

Anonymous
Anonymous
None
None
None
Anonymous
Reading Room only
disseminate
Conditional
Reading Room
Anonymous
University of Michigan only
disseminate
Conditional
University of Michigan
Anonymous
Archivists only
disseminate
Conditional
BDML
Archivists only
Archivists only
disseminate
Disallow
Executive records (ER)
Personnel records (PR)
Student records (SR)
Patient/client records (CR)

A couple of notes here:
  • We will not use PREMIS Rights Statements (at least those that apply to access/distribution) for AIPs that don't have restrictions.
  • When we do use PREMIS Rights Statements, they will be as minimal as we can make them with the intention that they will only be used by machines, not humans. Human readable rights statements will be recorded elsewhere, like ArchivesSpace Conditions Governing Access and Use notes. 
  • Most of the time, the End Date field will be OPEN, except when a Bentley policy is involved (ER, PR, SR and CR above). In those cases, an end date will let the repository know when a particular restriction expires.

Once an AIP with some sort of restriction is ready to go to DeepBlue, we'll park it somewhere temporarily[2], parse the METS file in the AIP, determine (based on the rights statements) the item and bitstream permissions, convert it to the DSpace Simple Archive Format and upload in batch to DeepBlue from there. It's sounding like the identifier for the Digital Object in ArchivesSpace will be in the AIP, so we're pretty confident we'll also be able to add the Handle back to ArchivesSpace farily easily as well.

We also think (hope!) that this approach, as long as we're consistent, would allow us to change our minds relatively easily in the future, say, if we decided after all that a more granular approach was the way to go.

But Wait! "disseminate" is Hard to Spell!

It occurred to us that in order for this approach to work, our processors can never make typos. We've all been there... this is a pretty unrealistic expectation.

For the time being, we're planning to use Greasemonkey (in Firefox) and Tampermonkey (in Chrome) to help us out with this particular problem. These are browser extensions that customize the way a web page displays or behaves using small bits of JavaScript.

We've written a fairly basic script (you can see our draft here), that looks for URL patterns that match the Add Act pages in Archivematica (as you can see in that script, http://sandbox.archivematica.org/transfer/*/rights/grants/*/ and http://sandbox.archivematica.org/ingest/*/rights/grants/*/). When it finds one, it adds an additional dropdown, like so...

It even has a nice logo!


When an option is chosen (Reading Room was chosen above), it automatically fills out the rest of the form, just like we need it. When a Bentley policy is involved (that requires an end date), it asks a processor for a creation or accession date (still working on a nice datepicker option for this), does some math, and calculates the appropriate end date. It's not the most elegant solution but we think it works for now!

Conclusion


In the end, it's perhaps a little clearer as to why PREMIS wasn't really meant for this kind of thing. Still, maybe square pegs sometimes do fit into round holes...


Seriously, though, let us know what you think!


[1] Although these types of things are not viewable, downloadable or even searchable in DSpace, typically we still provide a link to them in the collections finding aid. 
[2] Philosophically, I'd argue that access and distribution is a fundamental part of digital preservation... maybe the most fundamental part.
[3] At the end of the grant, AIPs without restrictions will be automatically uploaded to DSpace and recorded in ArchivesSpace without any more human intervention!

Tuesday, September 6, 2016

This One Time, At ArchivematiCamp...

While it's been a bit over a week since the inaugural ArchivematiCamp (or, as my colleagues Max and Dallas prefer, "Archivematica Camp") was held here in Ann Arbor, we're still basking in the afterglow... 36 campers and 5 counselors braved the rain and mosquitoes to gather at the University of Michigan's School of Information for two and a half days of discussions on microservices, metadata, and the mechanics of our favorite digital preservation system.  The camp's full agenda will give you some idea of the variety of topics covered in the 'Curator' and 'Technologist' streams—or maybe you were following on Twitter:



While I would be hard-pressed to summarize all the events and discussions, I did want to talk a little bit about Dallas and Max's demonstration of the new Appraisal Tab functionality we've developed as part of our grant project (and which is slated for release in version 1.6 of Archivematica).  In the Q and A period following the demo, counsellors Ben Fino-Radin and Kari Smith helped kickstart a conversation about how the functionality in the Appraisal Tab could be complemented and supplemented by additional external tools/platforms.

As one example, Ben noted that his work with audiovisual materials requires advanced technical metadata extraction and codec characterization that has not always been available in Archivematica.  (As I understand from my notes, the MediaTrace report produced through a collaboration between MoMA and MediaArea is now available in Archivematica.)

Kari brought up the possibility of integrating an email processing tool like ePADD into a workflow that also involves Archivematica.  Given the unique functionality (and awesome interface) of this platform, it doesn't really make sense to replicate it in Archivematica or to cram another full-featured external tool into the Appraisal Tab.

Instead, as we discussed in our previous post on the Archivematica Users' Group meeting at SAA, we should look at ways of establishing/facilitating 'handshakes' between platforms so that the data and any associated metadata (especially preservation or technical) can be passed along and incorporated into the Archivematica METS or maybe even acted upon by Archivematica.  For instance, if you ran bulk_extractor on a disk image in the BitCurator environment, it would be nice to reuse those scanner reports in Archivematica instead of having to run them again.

We're really excited that other members of the archives and digital preservation communities are thinking about how the work we've done with the Appraisal Tab can be adapted or extended to satisfy local needs and workflows! In the same spirit, Kari also asked if DIPs could be produced and likewise tied back to ArchivesSpace (yes, by extending the code!) and Ben (was this Fino-Radin or Goldman?  I'm leaning towards the latter...) asked about the possibility of creating ArchivesSpace event records based upon actions in Archivematica (totally feasible--just need some coding!).

We're hoping to blog a bit more about camp in some upcoming posts, so I'll wrap things up here by noting that my only regret from camp was the absence of the long-promised 'goodbye song':

Thank heavens the Internet can fix anything!



Tuesday, August 9, 2016

Archivematica Users Group @ SAA

Greetings, all! The Bentley's Mellon grant team had a busy and exciting time last week in Atlanta during the annual meeting of the Society of American Archivists (SAA).  One of the highlights was Dallas's and my demonstration of current functionality in our ArchivesSpace-Archivematica-DSpace Workflow Integration project during the Archivematica Users Group meeting (hosted by the ever-gracious Dan Gillean) on Wednesday, August 3.
We've given a lot of demos over the past year, at conferences as well as to individual institutions and groups (including the Digital Preservation Coalition), but this presentation really stood out for us.  While we always get a lot of great questions from folks, several individuals suggested new and exciting functionality that could be added to the Appraisal Tab in future releases.

Thinking Bigger (and Better!)

First, Seth Shaw, Assistant Professor of Archival Studies at Clayton State University (and developer of the Data Accessioner), pointed out that a tree map visualization would be a helpful addition to the 'Analysis Pane' in the Appraisal Tab.

As it now exists, the Analysis Pane includes a tabular report on file format distributions in a given transfer as well as pie charts that depict this range by (a) number of files per format and (b) total volume of the respective formats:

Analyze this!

A tree map would give archivists an alternative means of visualizing the relative size of directories and files and give insight to where given file types are located. This information could be very helpful in terms of comprehending directory structure, understanding the nature of content in a transfer, and identifying content that might require additional resources during ingest (such as large video files). 

It's also important to note that not all tree maps are created equal, as different instantiations have different affordances.  For instance, TreeSize Professional yields a visualization that includes labels of directories and file formats and uses color coding to show the relative depth of materials in the folder structure of a transfer, but doesn't represent individual files:

Whose size? TreeSize!

WinDirStat, on the other hand, color codes individual file formats, represents individual files in the tree map, and highlights directories or file format types based upon the user's selection from its directory tree or file format list:

The Colors, Children!

Next, Susan Malsbury, Digital Archivist at NYPL, asked about the potential of including Brunnehilde in the Analysis Pane.  For those of you who are not in the know (which very recently included me!), "Brunnehilde" is (to quote its developer, Digital Archivist Tim Walsh)
a Python-based reporting tool for born-digital files that builds on Richard Lehane's Siegfried. Brunnhilde runs Siegfried against a specified directory, loads the results into a sqlite3 database, and queries the database to generate CSV reports to aid in triage, arrangement, and description of digital archives. Reports include:
  • Sorted file format list with count
  • Sorted file format and version list with count
  • Sorted mimetype list with count
  • All files with Siegfried errors
  • All files with Siegfried warnings
  • All unidentified files
  • All duplicates (based on a Siegfried-generated md5 hash)
Walsh's tool could provide much more granular information about the contents of a transfer and when combined with visualizations it would offer additional and highly interesting ways to review and appraise digital archives.

Malsbury also introduced a question of how the Appraisal Tab's new functionality could accommodate disk images.  While Archivematica does have some support for transfers comprised of disk images, our use cases for the grant project did not specifically address this content type.  As Gillean noted, this question begs for additional cross-platform workflow integration.  Since BitCurator is designed to handle disk images, it makes sense for members of the open source digital archives community to explore how it can work in conjunction with Archivematica rather than replicate its functionality in the latter platform.  (A sidebar conversation Max Eckard and I had with Sam Meister from Educopia and the BitCurator Consortium confirmed that this is an important area of inquiry...)

Next Steps...

We're in the final stretch of our grant project and—sad as it is to say—have come to realize that all the awesome ideas we've had for the Appraisal Tab aren't going to make it into the final product.  We will, however, have achieved all the major goals and deliverables that we established at the outset:
  • Introduce functionality into Archivematica that will permit users to review, appraise, deaccession, and arrange content in a new "Appraisal and Arrangement" tab in the system dashboard.
  • Load (and create) ASpace archival object records in the Archivematica "Appraisal and Arrangement" tab and then drag and drop content onto the appropriate archival objects to define Submission Information Packages (SIPs) that will in turn be described as 'digital objects' in ASpace and deposited as discrete 'items' in DSpace.   
  • Create new archival object and digital object records in ASpace and associate the latter with DSpace handles to provide URIs/'href' values for <dao> elements in exported EADs.
All the same, we're thrilled by the realization that the Appraisal Tab as it will exist in the upcoming version 1.6 of Archivematica is really just a beginning.  By developing the Appraisal Tab and introducing basic appraisal functionality (file format characterization, sensitive data review, file preview, etc.), we've dramatically lowered the bar for other institutions that want to integrate new tools or introduce new features.  (And yes, I did borrow liberally from Dan Gillean for that last thought!)  

We're really excited to see where other institutions and developers take the Appraisal Tab because—I, for one, would love to see textual analysis and named entity recognition tools like those in ePADD (or the other projects identified by Josh Schneider and Peter Chen in this great post from the SAA Electronic Records Section blog).  

What features or functionality would you like to see in the Appraisal Tab?  What questions do you have about our current processes? Please reach out to us via the comments section or email.

Thanks for reading; live long and innovate!

Friday, July 29, 2016

The Archival Integration Team at SAA2016

The Bentley's ArchivesSpace-Archivematica-DSpace Workflow Integration project team will be out in full force at the Society of American Archivist's Annual Meeting in Atlanta next week, where we'll be talking about some of the topics we've written about in detail on this blog, including our ArchivesSpace/Archivematica integration work, implementing new systems, preparing legacy description for migration to ArchivesSpace, and appraising digital content.

Be sure to add some of the following to your sched and stop by to get up-to-date information about our project or just to say hello!

Tuesday
ArchivesSpace Member Forum - I will be presenting on ArchivesSpace and Archivematica integration, particularly focusing on the ArchivesSpace pane in the new Archivematica Appraisal and Arrangement tab, some of the decisions we've made about creating and editing ArchivesSpace archival objects and structuring ArchivesSpace digital objects in Archivematica, and planned future enhancements (such as modifications to the ArchivesSpace Rights Statements module to facilitate mapping Archivematica PREMIS Rights to ArchivesSpace) during the "ArchivesSpace Integrations - A Status Report and Look Ahead" session.
Wednesday
Archivematica Users Group - Mike and I will be giving a brief presentation about and demonstration of the current Archivematica Appraisal and Arrangement tab.
Thursday
Aquisitions and Appraisal Section - Mike will be discussing the appraisal of born-digital content as part of a "discussion with several panelists who respond to an appraisal- and acquisitions-related scenario."
Friday
Graduate Student Poster Presentations - Devon will be presenting a poster on his contributions to our (recently completed!) project cleaning, reconciling, and ultimately "Preparing Legacy Finding Aids for Ingest into ArchivesSpace."
Session 506: You Are Not Alone! Navigating the Implementation of New Archival Systems - Max will be talking about our ArchivesSpace implementation (there will also be presentations about Archivematica implementation during this session)
Reference, Access, and Outreach Section - Max will be showcasing (along with our colleagues Cinda and Melissa) some of the ways in which the Bentley provides access to digital content in the RAO's Marketplace of Ideas.

Wednesday, June 8, 2016

Born-Digital Data: What Does It *Really* Look Like (Research Data Redux)

This is a follow up to Jenny Mitcham's recent Research data - what does it *really* look like post, and in particular, these questions she posed:
I'd be interested to know whether for other collections of born digital data (not research data) a higher success rate would be expected? Is identification of 37% of files a particularly bad result or is it similar to what others have experienced?

Background

Extracting technical metadata with the file profiling tool DROID has been part of our digital processing procedures for born-digital accessions since the beginning, so to speak, about 2012. Right before deposit into DeepBlue and our dark archive, a CSV export of DROID's output gets included in a metadata folder in our Archival Information Packages (AIPs). Kudos to Nancy Deromedi and Mike Shallcross for their foresight and for their insistence on standardizing our AIPs. It made my job today easy!

At first I was thinking that I'd write a Python script that would recursively "walk" the directories in our dark archive looking for files that began with "DROID_" (our standard prefix for these files) and ended with ".csv". That would have worked, but I'm a bit paranoid about pointing Python at anything super important (read: pointing my code at anything super important), and making a 1.97 TB working copy wasn't feasible. So, I took the easy way out...

First, I did a simple search (DROID_*.csv) in Windows Explorer...


...made my working copy of individual DROID outputs (using TeraCopy!)...



...and wrote a short script to make one big (~215 MB) DROID output file.



These are not the DROIDs you're looking for.


Note that I had the script skip over Folders (because we're only interested in files here), packaged files, like ZIPs (because DROID looks in [most of] these anyway) and any normalized versions of files, which I could identify because they get a "_bhl-[CRC-8]" suffix. Kudos again to Nancy and Mike for making this easy.

All of the data (about 3/4 million individual files!) in this sample represents just about anything and everything born-digital that we've processed since 2012... basically anything related to to our two collecting areas of the University of Michigan and the state. I'd guess that much of it is office documents and websites (and recently, some Twitter Archives). The vast majority of the data was last modified in the past 15 years, and our peaks are in in 2006 and 2008. The distribution of dates is illustrated below...


Here are some of the findings of this exercise:

Summary Statistics

  • DROID reported that 731,949 individual files were present
  • 658,520 (89.9%) were given a file format identification by Droid
  • 657,808 (99.9%) of those files that were identified were given just one possible identification. 610 files were given two different identifications, 1 file was given three different identifications, 3 files were given five different identifications, 13 files were give six different identifications, 45 files were give seven different identifications, 28 files were given eight different identifications, and a further 12 were given nine different identifications. In all these cases, the identification for 331 files was done by signature and the identification for 380 files was done by extension.

Files that Were Identified

  • Of the 658,520 files that were identified:
    • 580,310 (88.1%) were identified by signature (which, as Jenny suggests, is a fairly accurate identification)
    • 13,478 (2%) were identified by extension alone (which implies a less accurate identification)
    • 64,732 (9.8%) were identified by container. Like Jenny said, these were mostly Microsoft Office files, which are types of container files (and still suggests a high level of accuracy)
    • Lots of these were HTML and XML files, although there were some Microsoft Office files as well
  • 180 different file formats were identified within the collection of born-digital data
  • Of the identified files 152,626 (19%) were HTML files. This was by far the most common file format identified within the born-digital dataset. The top 10 identified files are as follows:
    • Hypertext Markup Language - 152,626
    • JPEG File Interchange Format - 142,161
    • Extensible Hypertext Markup Language - 62,039
    • JP2 (JPEG 2000 part 1) - 56,986
    • Graphics Interchange Format - 48,317
    • Microsoft Word Document - 38,459
    • Exchangeable Image File Format (Compressed) - 18,826
    • Microsoft Word for Windows Document - 18,140
    • Acrobat PDF 1.4 - Portable Document Format - 17,840
    • Acrobat PDF 1.3 - Portable Document Format - 10,875

Files that Weren't Identified

  • Of the 73,421 that weren't identified by DROID, 851 different file extensions were represented
  • 1,888 (2.6%) of the unidentified files had no file extension at all
  • The most common file extensions for the files that were not identified are as follows:
    • emlx - 21,987
    • h - 8,545
    • cpp - 8,501
    • htm - 8,032
    • pdf - 5,216
    • png - 4,250
    • gif - 2,085
    • dat - 1,419
    • xml - 1,379

Some Thoughts

  • Like Jenny, we do have a long tail of file formats, but perhaps not quite as long as the long-tail of research data. I actually expected it to be longer (10.1% seems pretty good... I think?), since at times it feels like as a repository for born-digital archives we get everything and the kitchen sink from our donors (we don't, for example, require them to deposit certain types of formats), and because we are often working with older (again, relative) material.
  • We too had some pretty common extensions (many, in fact) that did not get identified (including the .dat files that Jenny reported on). Enough that I feel like I'm missing something here...
  • In thinking about how the community could continue to explore the problem, perhaps a good start would be defining what information is useful to report out on (I simply copied the format in Jenny's blog), and hear from other institutions. It seems like it should be easy enough to anonymize and share this information.
  • What other questions should we be asking? I think Jenny's questions seem focused on their goal of feeding information back to PRONOM. That's a great goal, but I also think there are ways we can use this information to identify risks and issues in our collections and assure that our or our patron's technical environments support them, as well as to advocate in our own institutions for more resources.
And, if you haven't yet, be sure to check out the original post and subscribe to that Digital Archiving at the University of York blog! Also be sure to check out the University of York's and University of Hull's exciting, Jisc-funded work to enhance Archivematica to better handle research data management.

[1] I think Jenny's only interested in original files, but an interesting follow-up question might ask questions along the lines of what percentage of files we were able to normalize...

Monday, May 16, 2016

Introduction to Free and/or Open Source Tools for Digital Preservation


Over the weekend, Mike, Dallas and I gave a workshop entitled "Introduction to Free and/or Open Source Tools for Digital Preservation" as part of the Personal Digital Archiving 2016 conference. This hands-on workshop introduced participants to a mix of open source and/or free software (and some relatively ubiquitous proprietary software) that can be used via the command line or graphical user interfaces to characterize and review personal digital archives and also perform important preservation actions on content to ensure its long-term authenticity, integrity, accessibility, and security. It was awesome!

After introductions, we discussed:
  • Digital Preservation 101
    • Definitions
    • Challenges
    • Models
  • Tools and Strategies (the hands-on part!)
    • Characterizing and reviewing content
      • WinDirStat
      • DROID
      • bulk_extractor
    • File format transformations (I discussed this a bit in a recent blog post on the theory behind file format migrations)
      • Still Images
        • IrfanView
        • ImageMagick
      • Text(ual) Content 
        • Adobe Acrobat Pro
        • Ghostscript
      • Audio and Video
        • ffmpeg
        • HandBrake
    • Metadata for Digital Preservation
      • Descriptive Metadata
        • Microsoft Word
        • Adobe Acrobat
      • Technical Metadata
        • ExifTool
      • Preservation Metadata
        • MD5summer

In case you're interested, we thought we'd make the slides...



...and exercises....



available to a wider audience! Enjoy!

Monday, May 9, 2016

Grant Update: Extension through Oct. 2016

Greetings, all; while things have been a little quiet on our blog as of late, we've been as busy as ever on our Mellon-funded ArchivesSpace-Archivematica-DSpace Workflow Integration project.

Amidst the general hustle and bustle here in Ann Arbor, we neglected to mention that the Mellon Foundation approved an extension of our project through October 31, 2016.  While things were on course to be completed by the original deadline of April 30, we decided that an extension was necessary so that our consultants at Artefactual Systems could further refine the interface of Archivematica's new Appraisal Tab and thoroughly identify and fix bugs without rushing.  The extended period of time will also give archivists at the Bentley an opportunity to gain expertise with the new functionality and thereby document workflows that may be shared with the archives and digital preservation communities.

Current and upcoming work on the project includes:
  • Refactoring the Archivematica workflow to support the new packaging functionality (in both the Archivematica ‘Ingest’ pipeline as well as the platform’s ‘Storage Service,’ which is used to track and recompile completed AIPs).
  • Verifying that packaging steps are recorded accurately in Storage Service pointer files.
  • Evaluating PREMIS 2 vs. PREMIS 3 to decide how to best implement the preservation metadata for packing (and implementing PREMIS 3 support, as needed).
  • Implementing user interface changes to support the new workflow (and also allow users to adhere to existing Archivematica workflows and AIP packaging procedures).
  • Establishing (and then verifying) a workflow and protocols to automate the transfer data and metadata from Archivematica to DSpace.
  • User interface changes in the Storage Service.
We'll look to provide highlights of these processes in the coming months...so stay tuned!