Having some common ground/shared understanding is very important, as our workflow establishes the following equivalences:
I'd like to take this opportunity to review the reasons behind this structure, but first I think it would be useful to take a look at how others in the ASpace user community are approaching digital object records.
Perspectives on the ASpace Digital Object Record
The Digital Object record is optimized for recording metadata for digitized facsimiles or born-digital resources. The Digital Object record can either be single- or multilevel, that is, it can have sub-components just like a Resource record. Moreover, the record can represent the structural relationship between the metadata and associated digital files--whether as simple relationships (e.g., a metadata record associated with a scanned image, and its derivatives) or complex relationships (e.g., a metadata record for a multi-paged item; and additionally, a metadata record for each scanned page, and its derivatives). One or more file versions can be referenced from the Digital Object metadata record. The Digital Object record can be created from within a Resource record, or created independently and then either linked or not to a Resource record.
- Define systems of record for data/metadata and determine how ASpace fits into this ecosystem.
- Identify how information in the digital object records can be used now and in the future (i.e., the records can bring together digital content stored in various systems/locations, serialize information to EAD files, respond to queries via the API, etc.)
- The general position outlined by Max is still accurate ("We're thinking of the DO module more as a place to record location than as a place to "manage" digital objects or the events that happen to them"): we are primarily interested in using the ASpace digital object module to create <dao> tags and links to content in EAD finding aids.
- We would therefore not be looking to include technical/preservation metadata about AIPs in the digital object record or do extensive arrangement with the digital object components.
- With the above in mind, the ‘digital object’ records become somewhat analogous to physical ‘instances’--these are manifestations of the archival description expressed in the associated archival object record.
- In addition, within AS a digital object may be ‘simple’ or ‘complex’ (in the latter case, comprised of one or more digital object components). We're now contemplating slightly more 'complex' digital object records...
- We've also been working with Artefactual Systems and some other peer institutions to think more about how and where to record machine-understandable/actionable PREMIS rights information associated with digital objects.
- Within the new Appraisal and Arrangement tab, a dedicated ASpace pane will display the ‘archival objects’ (i.e., the subordinate components) of a given resource record in a hierarchical structure. Within the ASpace pane, users will be able to create new archival objects and add basic metadata.
- Within the appraisal tab, archivists will drag/drop content (individual files and/or entire directories) to a given ‘archival object’ in the ASpace pane.
- All content associated with an archival object will be a single SIP/AIP in Archivematica.
- Furthermore, each SIP/AIP will comprise a single ASpace ‘digital object’
- 1 ASpace digital object = 1 Archivematica SIP = 1 Archivematica AIP = 1 DSpace item
- We are not spinning off separate DIPs; we may configure Archivematica's Format Policy Registry (FPR) to spin off lightweight copies for some file formats, but otherwise the Archival Information Packages (AIPs) will serve for both preservation and access.
- The Bentley's past/current use of DSpace is another factor here, as a single 'item' may contain one or more 'bitstreams' (i.e., files). We therefore would like to be able to do some minimal arrangement of bitstreams within an ASpace digital object to control how materials will be deposited to DSpace.
- Whenever possible, we strive to describe materials at an aggregate level, which means that a fairly large number of files (in number or space on disk) may be associated with a given 'item.' We also package content in .zip files to reduce the number of files we have to manage and that our users have to download.
- To avoid presenting our users with extremely large .zip files that could be difficult to download and access, we often will chunk content across multiple .zips--i.e., instead of one 10 GB .zip, we will provide users with five 2 GB zips, as evidenced in this example from our Governor Jennifer Granholm collection:
- In other cases, we might want to differentiate between access and preservation copies of materials in a collection. As an example, the following DSpace item includes an .mp4 access copy of a video recording while the .zip file contains an .iso image file of the original DVD:
- We see the DSpace item as being the equivalent of the ASpace digital object record, with the individual bitstreams corresponding to the digital object components.
- We won't be using DSpace forever (Michigan recently became a Hydra partner) and so we don't want to predicate our ASpace-Archivematica workflows on legacy systems.
Potential New FeaturesSo...where does this leave us? I wanted to talk through a possible arrangement workflow (based upon the new Appraisal tab) and how this might be translated into ASpace digital object records. Let's see how this goes...
- A user would select a particular archival object in the ASpace pane and click the “Add digital object component” button.
- Clicking the button will trigger the creation of a ‘digital object component’ that will appear as a child of the archival object.
- Adding at least one digital object component essentially creates the main digital object record (which may include multiple components).
- All the ‘digital object components’ nested under an archival object will comprise a single AS ‘digital object.’
- In arranging the digital object components, users would only be able to work with 1 level of hierarchy--this will be very simple and minimal ‘arrangement.’
- A digital object component will essentially be a bucket or a virtual container where one or more files and/or folders may be dragged/dropped.
- To visually distinguish the ‘digital object component’ from archival objects, it should have a different icon (perhaps use the following from the digital object record in ASpace) and/or the text might have a different colored background.
- The digital object component would display a default title, comprised of the associated archival object’s title and/or date and a consecutive integer. (In other words, for the archival object ‘Archivematica Series’, the first digital object component would be ‘Archivematica Series 1’, the next would be ‘Archivematica Series 2’ and so forth.)
- The user would drag one or more files/folders on top of a digital object component. The file(s) and/or folder(s) would be nested under the digital object component. The following example has two digital object components:
- The user can select a digital object component and click the ‘Edit Metadata’ button. This would permit the user to edit the only pieces of metadata required for digital object components, ‘title’ and/or ‘label’, as seen below in AS:
We've also thought about some simple rules for digital object components (and information packages), as well. Once an archivist clicks the 'Finalize Arrangement' button, Archivematica will create a SIP for the materials associated with a given archival object and commence its Ingest procedures, which may result in the creation of preservation copies (or OCR text). Based upon this:
- If there is only one file, it will be deposited to DSpace as individual bitstreams.
- If there is more than one file and/or a folder (including derivatives produced by Archivematica), everything in the digital object component will be included in a single .zip file (perhaps using the digital object component title) that will be deposited to DSpace.
- Additional components of the AIP produced by Archivematica (the logs folder, metadata folder, and METS file) will be packaged in a .zip file and deposited as an additional digital object component (perhaps with some default file name). The Bentley would want this content to be be inaccessible to the general public (and ‘not published’ within the ASpace digital object record).
The digital object components (i.e., each specific grouping of content as well as the Archivematica logs and metadata) would then be added as children of the main digital object record:
The digital object component records might also include extent information, more specific rights information, or...???