Digital Preservation is for People: An Archivist's Take on the Digital Humanities

This past week, Dallas and I and a number of folks from the Bentley Historical Library attended the Humanities, Arts, Science, and Technology Alliance and Collaboratory (HASTAC) 2015 Conference in Lansing, MI. HASTAC is one of the premier digital humanities conferences around.


While not directly related to our Archivematica-ArchivesSpace-DSpace Workflow Integration project, HASTAC served as a good reminder of why it is we're doing what we're doing.

**Spoiler Alert!** 
It's not so that it's easier to manage or preserve digital archives (although we're very excited about that, thank you very much).

Digital Preservation is for People

The most important reason we're doing what we're doing is for present and future people [1]. People just like the ones we heard from at HASTAC. People who use our digital archives. People who get frustrated with them (and there were plenty of those). And yes, even those people who create their own "digital archives" (as much as we like to look down on their efforts with disdain--and throw what they do in quotes!--and think to ourselves: "That's not really a digital archive," or "That's not how I use the word curate.").

If you were to ask me, the most important finding of the 2015 National Agenda for Digital Stewardship is that we have to do a better job of connecting to researchers (they also found that we have to do a better job of connecting to the creator community, which I'll touch on a bit in this post).

That is, we have to start thinking outside of a well known box...

OAIS Reference Box Model [2]

...and start thinking about those people on the margins for whom we do what we do: Consumers (and Producers). Attending HASTAC was one of the ways we're trying to do just that.

The People

Whether you call them Consumers (with a capital "C") or consumers, researchers, users, end users, or just plain people, we certainly heard from them at HASTAC. A lot them were from one particular Designated Community (to borrow another term from OAIS), that is, digital humanists (certainly not the consumer I think most archivists have in mind when they create digital archives), and some were librarians and archivists (the consumer I think most archivists have in mind when they create digital archives). Here's what I found out...

People Use Our Stuff in Interesting Ways

One of the most exciting things I learned at HASTAC was that people actually use the digital archives we create. And not just to look a pretty pictures. They use them in new and exciting ways that push the boundaries of knowledge:

A traveler puts his head under the edge of the firmament [a metaphorical illustration of either the scientific or the mystical quests for knowledge] in the original (1888) printing of the Flammarion engraving. [3] [4]

On this point I'd like to quote from the National Agenda for Digital Stewardship:

Researchers increasingly seek not only access but enhanced use options and tools for engaging with digital content... Models for access continue to evolve as methods for analyzing and studying contemporary born-digital and historic digitized materials are available. 

One of the most exciting examples of this is the Australian Federal Election Speeches site, which allows you to explore and visualize speeches by Australian politicians in exciting ways. Built by a "roman historian turned digital humanist," Fiona Tweedie, the textual data underlying these visualizations comes from the Museum of Australian Democracy. In fact, Dallas and I attended a Software Carpentry workshop at HASTAC she co-taught where we used the Natural Language Toolkit to do some basic frequency analysis on, for example, a collection of inaugural addresses from American presidents (turns out the words "fear" and "terrorism" are much more common than they used to be)--the building blocks of a site like that one.

I know what you're thinking: Cool! So what? Well, these new uses of digital archives have big implications for how we process them and make them accessible to people. "PDF is where DHers go to die!" was one of my favorite quotes from the conference. While it's true that PDF/As meets archival standards, and that page-turning applications are cool, sometimes things like this actually inhibit the work that scholars want to do. To do this type of analysis, digital humanists would prefer to have some way to download all or a portion of the plain text of a digital archive. It's also nice when metadata is clean (more on that later) and structured and available for download in a similar way.

People Get Frustrated When They Use Our Stuff

Another very common theme at HASTAC was that when people use our stuff, often they get frustrated. Here are a couple things I heard, in no particular order (all archives have been anonymized to protect their identity):

  • Some archives are "not functional." Stated by Owen Fenton in a presentation on using a newspaper archive to trace the development of Northern Irish identity. 
  • Inconsistent metadata "shattered my dreams." Exclaimed (truly exclaimed!) by Frederico Pagello, describing the moment he realized that he would not be able to analyze all European crime fiction using the fairly comprehensive but very dirty bibliographic records he had been collecting. 
  • Digital archives are "stressful." This one I heard through the grapevine, but I think it had something to do with analyzing Enron emails.

I think these observations can break down into two categories: usability of our access mechanisms and metadata. On usability, I'd like to quote again from the 2015 National Agenda for Digital Stewardship: "Usability is increasingly a fundamental driver of support for preservation, particularly for ongoing monetary support." Read: People give us money for digital archives when they like what they see on the other end (online). We can get all nerdy about file formats and storage configurations, but I promise you that we're the only ones that get excited about these things. We will also never be able to hang our hats on the fact that we saved some bits if nobody ever uses them, or starts to but quits because they aren't usable.

So what do we do? First, we have to acknowledge that access, use and re-use are as important as preservation in digital curation (so yes, the website is part of your job). Then, work to make it things better, little-by-little. What do you make better? Here are a couple of ideas:

On metadata, I think we all already understand the issue. But how do we fix it? First, we have to get over the fact that we have dirty metadata. We can blame it on our predecessors all we want (and I'm as guilty about this as anyone else), but that doesn't actually help. Describing collections is the most time-consuming and labor-intensive part of the process because it has to be done by humans, and humans make mistakes. Deal with it.

After we get over it, we then have to commit to finding ways to make this better. I'd suggest that an important second step to addressing the metadata issues is to have a system of record, wherever that is. Having three places where you record descriptive information actually makes cleanup harder. Finally, start cleaning. Whether that's having a system in place to correct mistakes as you find them, little-by-little, or whether you're migrating to ArchivesSpace and as part of your legacy EAD import you decide that you have an unprecedented opportunity to systematically clean your metadata, get it done!

I should also note here that digital humanists get frustrated with all of the errors in OCR'd text. For better or worse, however, this didn't make me feel very compelled to change the way we process collections (except for maybe, I'll concede, very important--and very small!--collections, or by investing in OCR research and development). Transcription is time-consuming and expensive! MPLP all the way!

People Create Their Own Digital Archives

Finally, I met a lot of non-archivist people who create their own digital archives, and mean many different things when they say digital archive. There were many, many examples of this. Here are just a few:

  • Fortepan Iowa, an initiative by a historian and a communications professor (and a programmer) to make digitized images of Iowans through the years available online;
  • Nashville's New Faces, a not-quite-ready-for-production effort by a digital humanists to let immigrants from all over the world tell their own stories;
  • a map of installations along the Way of Santiago de Compostela;
  • Hoccleve Archive, an archive by a historian of resources for scholars, teachers, and students interested in Thomas Hoccleve, his works, and their textual history; and
  • citizen archivists! There was a lot of talk about these people. I had to throw them in somewhere.

In addition to creating their own digital archives, people like to contribute to existing digital archives, and see this as a way to overcome the mistrust that exists between people who have been marginalized and institutions like archives that often represent "official" history. Another of my favorite quotes from HASTAC was that "Crowdsourcing is counter-hegemonic." (As it turns out, I'm all about crowdsourcing, but I won't get into it here because it's slightly out of scope.)

So what does all this mean? How can we support these folks? I have to admit, I'm a bit at a loss here. I know there is something to be said about the role of a digital archivist, and how sometimes it's educational/consultative and not practical/hands-on. I also know there's something to be said about personal digital archiving initiatives. I'd be curious to know in the comments, though, if anyone has any specific ideas about how to support researchers who want to build digital archives for their own research, and the digital archives that they create.

Summing Up

By way of conclusion, HASTAC served as a good reminder that the reason (or at least one of the reasons) we do digital preservation is for the people who use the digital archives that we preserve and provide access to. They're important, and we can learn a lot from listening to them!

One last thing I'll mention. The "myth of immateriality" seems to be widespread in the digital humanities world (maybe for those who are humanists before digital humanists). While a couple of speakers acknowledged that the binary of analog and digital was a false binary (yes!), there were a number of speakers who seemed to imply that "the analog" is more real than "the digital." Time for some evangelism!

[1] Websites are for People, too. Thanks, Matt! I hope you don't mind that I borrowed your title.
[2] "OAIS-" by Poppen - Own work. Licensed under Public Domain via Wikimedia Commons -
[3] "Flammarion" by Anonymous - Camille Flammarion, L'Atmosphere: Météorologie Populaire (Paris, 1888), pp. 163. Licensed under Public Domain via Wikimedia Commons -
[4] This image was Scott B. Weingart's opening keynote, Connecting the Dots. I highly recommend it.

P.S. I'm very proud of myself for not making even one joke about a needle in a HASTAC.

