Friday, September 18, 2015

What We Talk About When We Talk About Access

With a tip o' the hat to Raymond Carver, I want to use this post to try to and illuminate (for myself, if nothing else) some of the angles and issues surrounding 'access' to digital archives.

On the surface, the topic appears simple: I have some stuff that I want people to see, so I put it online or provide a dedicated terminal in my reading room and—voila!—access!

Made available by Flickr user Steve Rhode under a CC Attribution-NonCommercial-NoDerivs 2.0 Generic License
But even in this rosy scenario, there are a lot of questions: what platform would you use to host things online?  Will access copies (i.e., DIPs) differ from preservation copies (AIPs)?  If using a dedicated terminal, how will content be organized and how will researchers find desired materials? If copying files to a terminal or removable media, will staff be able to respond to researcher requests in a timely fashion? And what about rights?

Now, I certainly don't want to be like a certain you-know-who...


...but there are a lot of considerations here. Complex ones, too.  At the same time, simply waiting around for the stars to align and the *perfect* solution to emerge won't cut it, either.  Therefore, inspired by the various presentations on access that I saw last month at SAA, I'd like to give a brief overview of our current approach to access and then lay out some of the questions and challenges we're starting to explore here at the Bentley.

Just Dropped In (To See What Condition My Condition Is In)

The Bentley Historical Library has taken a fairly aggressive (progressive?) approach to providing access to our 'open' or unrestricted digital archives.  All such content is freely available for download and use via our archival community in Deep Blue, the University of Michigan's DSpace repository:


Deep Blue is managed by staff in the University of Michigan Library Information Technology division and we considered ourselves to be very fortunate when we started using it as both a preservation repository and access portal in 2008.  Prior to that (and not having any in-house IT), digital materials were either placed on optical disk and brought out to patrons in our reading room or hung off of our website and linked to from finding aids.

Moving to Deep Blue/DSpace was clearly a step forward, but the change brought about some additional challenges due to the basic structure (dare I say data model?) of the repository:

  • A 'community' contains 'collections' (which may be grouped together in sub-communities, we've formed one of these for university faculty papers)
  • A collection in turn contain 'items' (which may be associated with one or more files or 'bitstreams').
  • The default metadata schema is Dublin Core (which makes crosswalking from EAD ...interesting...)
While this relatively flat structure works great for traditional institutional repository fare (white papers, articles, and discrete digital objects), it really isn't suited for the complex intellectual hierarchies of archival collections.  So we've had to make do...

As with our physical and/or analog record groups and manuscript collections, our materials in Deep Blue are organized by the principle of provenance (and are often extensions of existing collections):


Within a collection we have our items—and here's where we've had to get creative:


Given the flat structure of DSpace, we are using the title metadata to help group related content together and preserve the hierarchical intellectual arrangement of materials.  As a result, the following description from our Jennifer Granholm finding aid...
...becomes the following item title in Deep Blue:
We also package materials in .zip files so that we only have to manage one file and our users don't have to download hundreds or even thousands of files.  Because content must actually be downloaded to a local machine to be used or rendered (unless a particular file format renders with a browser plugin), we have taken to chunking content across multiple .zip files when it gets to be above 2 GB:


The above digital object represents speeches, addresses, and other audio recordings from former Michigan Governor Jennifer Granholm for the year 2010.  All together, there's about 20 GB of content; by dividing this body of content into smaller chunks representing each month of the year, we've made it a bit easier for folks to download content.  And while I certainly don't think this solution is ideal, it's still a lot better than bringing a stack of CDs out to folks in our reading room.

Earlier in this post, I alluded to open or unrestricted content; we actually have three access profiles based upon rights and restrictions:
  • Open materials may be accessed by anyone anywhere at any time.
  • Restricted materials are only available to system administrators and digital curation staff; the items are not visible to other users nor is the metadata searchable.  Content is restricted for a number of reasons, including specifications in a gift agreement; the presence of sensitive personal data (related to HIPAA or FERPA as well as credit card numbers and Social Security numbers); and internal policy (for example, all executive records of the University of Michigan, while FOIA-able, are restricted for 20 years from the date of accession).
  • Reading-room only materials may only be accessed by computers within the IP address range of the library itself (and are not accessible by patrons using university wifi). This class is primarily composed of content where we do not hold copyright or donors have requested more restricted access. Our reading room rules, which all researchers must agree to follow, stipulate that these items "may not be copied, emailed or transferred in any way."  (While placing the burden on the researcher is by no means foolproof, it's much easier to implement and maintain than the locked-down computer terminals with which we earlier experimented.)
In addition to having the metadata and text-based file contents (when not packaged in .zip files) indexed by Google and other search engines, all materials are linked from online finding aids and/or catalog records.   People certainly seem to be finding our content, too: from 2008 through last month, we've registered 620,375 downloads (a figure that excludes downloads from robots or web crawlers).

Access: the Final Frontier

As we enter the final six months of our Mellon grant (and prepare to kick off a Hydra development project with colleagues at the University of Michigan Library), we have returned, time and again, to the challenge of providing access to digital archives.

There are a lot of great access portals to collections out there!  Some of the ones we've been particularly impressed with include those of:

Those are just a few of the many examples out there (and we meant to include your digital collections, but ran out of time...), but we've noticed that while these (and other innovative solutions) are vast improvements over an off-the-shelf option like CONTENTdm, they seem pretty unique to their local institutional context and IT environment.


The work we've been doing with Archivematica and ArchivesSpace has made us firm advocates of community-based approaches where folks at different institutions can share and contribute to common solutions, without having to reinvent the wheel (and continue to support and maintain that reinvented wheel. Indefinitely. All by themselves.).

This community interest recently led us to contribute user stories to the ArcLight project, "an effort to build a Hydra/Blacklight-based environment to support discovery (and digital delivery) of information in archives, initiated by Stanford University Libraries."  Likewise, we were excited to hear about the DPLA Archival Description Working Group and its implications for describing—and searching for and retrieving—digital archives.

Beyond the above, we've also been trying to articulate the different aspects or considerations related to access that could be common to cultural heritage institutions of all sizes and shares.  These are some very, very rough ideas, but we're interested in how we can:

  • Explore and better understand the challenges and opportunities surrounding OAIS functional entity of ‘access’
    • Present (and make understandable) the context/content of archival materials (including the relationships between digital, physical, and analog materials)
    • Enable search and retrieval of information while balancing item-level, aggregate, and collection-level description
    • Provide tools and functionality to view/render various formats (born-digital and digitized), including images, text, audio and moving image, web archives, and disk images 
    • Facilitate the analysis and reuse of data (including visual representations of metadata/data and tools or functionality that would facilitate distant reading of materials and other digital scholarship techniques)
    • Increase engagement with users (crowdsourcing or feedback)
  • Manage rights and enforce restrictions/permissions
  • Establish use metrics and collect quantitative data regarding impact of our collections and outcomes of curation activities
  • Permit users a more seamless experience using materials in searching for and using materials that are in disparate/siloed locations: online catalogs, HathiTrust, digital repositories, web archives, etc.
  • Leverage linked data: facilitate research across collections and institutions
At this stage in the game, we aren't even thinking about specific implementation strategies, as it seems there could/should/might/shall/will be a core set of features or functional requirements that could exist independent of any particular repository platform.  Having said that, it seems to us that an access portal should:
  • Emphasize on Interoperability: 
    • Create connections between tools and services
    • Permit us/other institutions to broaden current work and ‘plug in’ to larger framework
    • Avoid siloed/local solutions 
  •  Employ open source software:
    • There is a “need for engagement beyond simply making source code available, including supporting the development of user communities, creating adequate documentation, and cultivating relationships between developers working in libraries around the country.” (IMLS National Digital Platform)
  • Focus on end users
    • Meet needs within LAM communities for common solutions and interoperability as well as those of end users related to the access and use of digital archives.
    • End users are creating, accessing, and organizing content in ways that were never before possible and, in many cases, without the support of a knowledge professional.  The user should figure prominently in our strategy. How do we bring in their views, and identify the missing voices? (IMLS National Digital Platform)
So that's what we've been thinking about and pondering as of late... What are we missing?  What seems unnecessary?  What do you talk about when you talk about access?

No comments:

Post a Comment