Using the ArchivesSpace API
A major benefit of the ArchivesSpace API is that it allows users to interact with the ArchivesSpace application without having to modify the core application code or use ArchivesSpace's programming language, Ruby. This is great for us as, while we have learned enough Ruby to write some basic ArchivesSpace plug-ins, most of the programmatic work we've been doing for this project has been written in Python. Utilizing the ArchivesSpace API allows us to continue to use a programming language that we are more familiar with, especially with respect to accessing and modifying our legacy data, to interact directly with the ArchivesSpace application. However, while most of the scripts that will be detailed in this post are written in Python, an easier way to get started interacting with the ArchivesSpace API is by using curl in a Mac or Linux terminal or in a Windows Unix emulator such as Cygwin.
Once you've opened up a terminal or Cygwin with curl installed, you can send a simple request to the ArchivesSpace backend to test that the backend API is available. If you're running ArchivesSpace on a test instance on your local computer (which I highly recommend when experimenting with the API, and with the application in general), that request and the resulting response look something like this:
If you're interacting with an ArchivesSpace instance that is not running on your local machine, substitute your ArchivesSpace instance url for localhost and the port on which the backend is running for 8089.
Most of the really powerful things that can be done with the API require users to verify that they have the permissions to do so, so once you've verified that you can communicate with the ArchivesSpace API, the next step is to authenticate and start a session. To authenticate using the default administrator username and password, the request and first part of the response looks like this:
This request returns a longer response than the first one we sent, including the session token that you will need to include in a header that you must send with every subsequent request. Since the session token is a really really long string, it makes things a lot easier if you store the token as a variable, like so:
Subsequent requests sent to the API should include the session token in the header, like this:
This tells ArchivesSpace that you are an authenticated user who is allowed to do all sorts of really powerful and potentially dangerous things. Again, always test your code on a test instance of ArchivesSpace!
The majority of the actions that can be completed via the API take the form of either HTTP get or post requests. As the names may imply, get requests return some data to the user and post requests submit some data to the application. Get and post requests can often be sent to the same backend endpoint, with get requests including the particular ID of a desired record and post requests including the data (in ArchivesSpace JSONModel format) of the record to be created. Here are some quick examples:
|A get request that returns the IDs of all resources in repository 2|
|A get request that returns the JSON representation of resource 3|
|A get request that returns the JSON representation of subject 1|
|A post request that creates a new subject. The API returns a bit of JSON including the ID and uri of the posted subject|
|The new subject posted via API as seen in the ArchivesSpace staff interface|
Disclaimer: All of the following examples are based on our very particular use cases, legacy data, and programming expertise or lack thereof. As such, the exact workflows and Python scripts shared herein will likely not be applicable to most other institutions and their data. Rather, they are intended to serve as examples of what is possible via the ArchivesSpace API and to provide some guidance on how certain endpoints can be used.
Creating Digital Objects
The idea for this script came from a conversation with a colleague about the possibility of automating the creation of digital objects in ArchivesSpace for digitized archival objects using a spreadsheet inventory of a collection containing the ArchivesSpace ref ID for each archival object, a barcode or other sort of identifier for the digitized content, and the url for the digitized content. Such a spreadsheet could easily be created using an ArchiveSpace exported EAD, which contains the ArchivesSpace Ref ID for each archival object as a component level id attribute:
|An ArchiveSpace archival object. Note the Ref ID.|
|That same archival object in an ArchivesSpace exported EAD. Note the <c02> id attribute.|
The series of API requests to create a new digital object and link it to the existing archival object goes like this:
|The uri to the archival object that matches the searched for Ref ID|
3. Using the archival object's display string (a concatenation of its title and date) from the archival object JSON and the identifier and digital object uri from the spreadsheet, form the JSON for a new ArchivesSpace digital object and post it using the post digital object endpoint
|The posted digital object|
When really it should look like this:
We want our data to be migrated to ArchivesSpace as cleanly and correctly as possible and, while subdivided subjects might seem like not-such-a-big-deal, we plan to use ArchivesSpace to export MARC XML records for our collections, we will ultimately want to take advantage of the functionality of EAD3, and new subjects will likely be created in ArchivesSpace following the example in the second ArchivesSpace image above, so now is really the best time to ensure that our legacy subjects will be migrated to ArchivesSpace properly. Enter the API.
Posting subjects via the API is actually really simple (see the example using curl way near the top of this post). What was REALLY complicated about the process of using the API to post our subjects is that a term type is required for each individual term. Since EAD does not have the structure to support multiple terms, much less term types, the HIGHLY messy process that we used looks like this:
2. Get a MARC XML export of all of our archival collections from our catalog
3. Use a combination of scripts to make a csv of all of our unique EAD subjects with subdivided subjects split up into individual terms and a csv of all of our MARC subjects with each individual term and term type identified
4. Run a script that identifies the term type for all individual terms and outputs a csv with all of our unique EAD subjects with individual terms and term types included