Foundations and Principles
By and large, appraisal tends to be an iterative process as we seek to understand the intellectual content and scope of materials to determine if they should be retained as part of our permanent collections. If we're really lucky, curation staff and/or field archivists might be able to review content (or a sample thereof) prior to its acquisition and accession, a process that helps us pinpoint the materials we are interested in and avoid the transfer of content that we have identified as out of scope or superfluous.
This pre-accession appraisal may not be possible for various reasons (technical issues, geographic distance, scheduling conflicts, etc.), but in the vast majority of cases, we have some level of understanding about the nature of digital content and its relationship to our collecting policy by the time it's received, from a high-level overview or item-level description in a spreadsheet.
Whatever the case, appraisal is a crucial part of our ingest workflow, as it helps us to:
- Establish basic intellectual control of the content, directory structure, and/or original storage environment to facilitate the arrangement and description of content.
- Identify content that should be included in our permanent collections as well as superfluous or out-of-scope materials that will be separated (deaccessioned).
- Determine potential preservation issues posed by unique file formats, content dependencies, or other hardware/software issues.
- Address copyright or other intellectual property issues by applying appropriate access/use restrictions.
- Discover and verify the presence of sensitive personally identifiable information such as Social Security and credit card numbers.
Relative Size of Directories
Age of FilesDetermining the 'age' of files requires analysis of filesystem MAC times (Modification, Access, and Creation times), which can be a little dicey, especially if content has been migrated from one type of file system to another (the specifics of which I won't try to get into...). TreeSize permits archivists to create custom intervals to define the age of files and will create visualizations based on any of the MAC times (we generally use last modified, as it often coincides with creation dates and indicates when the content was last actively used). Clicking on any of the segments in the graph will produce a list of all files associated with that interval:
While this information may not be useful if the donor has accidentally altered the timestamps during the transfer process, knowing that there are especially old files in an accession can help guide our review of content and prepare us for any additional preservation steps that might be required. For instance, knowing that a collection includes word processing files in a proprietary file format from the 1990s might lead us to explore additional file format migration pathways if the content is of sufficient value.
File Format Information
Identification of Personally Identifiable Information
bulk_extractor.exe -o [output\directory] -x aes -x base64 -x elf -x email -x exif -x gps -x gzip -x hiberfile -x httplogs -x json -x kml -x net -x rar -x sqlite -x vcard -x windirs -x winlnk -x winpe -x winprefetch -R [target\directory]
We then launch Bulk Extractor Viewer, which allows us to review the potential sensitive information in context to verify if it represents a potential issue.
Based upon this review, we may delete nonessential content or use BEViewer's 'bookmark' feature to track content that will need to be embargoed with an appropriate access restriction.
Quick View Plus
The QVP interface is divided into three main parts in addition to the navigation menu and ribbon at the top of the application window. The right portion of the interface holds the Viewing Environment while the left-hand side is divided between the Folder Pane (which can also be used to review the directory structure) on the top and the File Pane on the bottom.
After QVP opens, the right and left arrows may be used to expand/collapse subfolders and navigate to the appropriate location in the Folder Pane. Once a folder has been selected, a list of its contents (both subfolders and files) will be displayed in the File Pane; after a file is selected, it will appear in the Viewing Environment. While we've noticed some issues with the display of PDF files, QVP meets the vast majority of our content review needs. Moving to the browser-based (and open source) environment of Archivematica, it will be interesting to see how well we are able to view/render content using standard browser plugins. We'll keep you posted...