Bentley Historical Library Curation Team Blog: ArchivesSpace + Vagrant = Archivagrant

If you're anything like me, you find yourself regularly creating, destroying, and recreating ArchivesSpace instances to run test migrations with a slightly modified set of data, to test new or updated plugins, or to verify that everything that previously worked still works with a new version of ArchivesSpace. Manually downloading an ArchivesSpace release, setting up a MySQL database, editing or copying over a config file, setting up default values, and all of the other steps that go into getting an ArchivesSpace release ready for testing can be a time consuming process. If you're even more like me, you went through this process countless times, all the while thinking "there must be a better way" without realizing the entire time that there is, indeed, a better way: Vagrant.

Vagrant is an application that allows users to create a single configuration file (aka a Vagrantfile) that can be used, shared, and reused to "create and configure lightweight, reproducible, and portable development environments" in the form of virtual machines. Vagrant, and tools like it, have been widely used by developers to solve issues that arise when development on a single application is done in a variety of development environments and to make the process of configuring a development environment across time, users, and operating systems easier and more consistent. Vagrant allows users to do some upfront work to configure an environment so that other people working on a project (and their future selves) will not have to worry about going through manual, time-consuming, and error-prone configuration steps ever again. While we aren't doing a lot of heavy developing, repurposing an existing tool like Vagrant to cut down on the amount of time we spend unzipping directories, editing config files, installing plugins, and so on allows us to focus on the work that we really need to do.

This blog post will walk through our ArchivesSpace Vagrant (or, Archivagrant) project to demonstrate how to setup a Vagrantfile and related provisioning scripts that, once written, will download the latest ArchivesSpace release, install dependencies and user plugins, setup ArchivesSpace to run against a MySQL database, and setup some of our local ArchivesSpace default values.

In order to beginning using Vagrant, you will need to first do the following:

Install VirtualBox
Install Vagrant
Install a terminal application with ssh (Secure Shell). If you're on Mac or Linux, the default terminal should work. If you're on Windows and have git installed, the git shell should also work. Another option is to use a Unix-like terminal emulator such as Cygwin, being sure to install the ssh package during setup and installation.

If you've never used Vagrant before and are curious about how it works in general, follow along with the Vagrant Getting Started instructions for information about downloading and installing boxes, setting up a basic Vagrantfile, and provisioning, starting, and accessing a Vagrant virtual machine before continuing with the following instructions.

As detailed in the Vagrant setup instructions, a Vagrantfile is all that's needed to set up a Vagrant virtual machine that can be installed, started, stopped, destroyed, and recreated at any time and on any machine. Here's ours:

The Vagrantfile for this project is pretty simple. First, it indicates that the box to be used is hashicorp/precise32, which is an Ubuntu 12.04 LTS 32-bit box. Next, ports 8080 and 8089 from the guest virtual machine are forwarded to the same ports on the host machine. This will allow us to use the browser on our host machine (the computer running Vagrant) to access the ArchivesSpace application running inside of the virtual machine and interact with the ArchivesSpace application as if it were running on our actual machine using its default backend and staff interface ports. That way, we don't need to worry about finding the IP address of the Vagrant machine or remembering any non-default ports (it also means that we don't need to change anything in the many scripts we have that access the ArchivesSpace API using http://localhost:8089). Next, the Vagrantfile allocates 2 GB of RAM to the virtual machine to improve performance. Finally, the Vagrantfile provisions the virtual machine using three shell scripts: setup_python.sh, setup_mysql.sh, and setup_archivesspace.sh.

The first shell script, setup_python.sh, is fairly short and simple. It first updates Ubuntu's packages (to ensure that we'll be downloading the most up-to-date packages in this and any subsequent provisioning scripts), then installs the Python package manager pip using the Ubuntu package manager, upgrades pip to its latest version, and installs the Python Requests library, which we'll be using later to find and download the latest version of ArchivesSpace and configure our ArchivesSpace defaults.

The next shell script, setup_mysql.sh, installs the Ubuntu mysql-server package, sets up a root username and password (since this is a temporary virtual machine used only for testing purposes, it's okay if the username and password are weak and exposed), and finally creates and configures a database following the official ArchivesSpace documentation for running ArchivesSpace against MySQL.

The final provisioning script, setup_archivesspace.sh, is the most detailed. It also makes use of two separate Python scripts that do the bulk of the work, so for the purposes of this post we'll take a look at setup_archivesspace.sh in two parts. It's worth reiterating that this provisioning script configures ArchivesSpace for our needs here at the Bentley Historical Library, but you should be able to modify it to suit your needs by changing some of the variables and removing some of the plugins (or adding your own).

The first part of the setup_archivesspace.sh shell script is pretty straightforward. The script first installs the Ubuntu packages that will be used in provisioning ArchivesSpace: Java (required by ArchivesSpace), unzip (used to extract the downloaded ArchivesSpace release), and git (used to install plugins from the Bentley's GitHub repository). Then, the shell script calls a separate Python script, download_latest_archivesspace.py, which is used to locate and download the latest release of ArchivesSpace.

This Python script uses the Python Requests library and the GitHub API to find the URL for the latest official ArchivesSpace release, download it, and extract it to the guest machine's home directory.

After downloading and unzipping the latest version of ArchivesSpace, the setup_archivesspace.sh provisioning script sets variables for the database URL and plugins entries to be edited in the ArchivesSpace config file. Then, several plugins are downloaded to the ArchivesSpace plugins directory, including the latest version of the container management plugin, our own EAD importer and exporter plugins, and our slightly modified version of Mark Cooper's very handy aspace-jsonmodel-from-format plugin (used to convert our legacy EADs to ArchivesSpace JSONModel format before posting them via the API -- we'll blog about that at some point, but it makes error identification much easier). Next, the setup_archivesspace.sh script edits the ArchivesSpace config file, replacing the default database URL and plugins entries with the variables that we set up earlier. The script continues by running the ArchivesSpace setup-database.sh script, then configures ArchivesSpace to run at system start (so we won't have to access the virtual machine just to start the application), and starts ArchivesSpace. Finally, the provisioning script calls another Python script, archivesspace_defaults.py, to set up some of our default configurations.

This script uses the ArchivesSpace API to setup some of the default values that we've been using for testing, including setting up a repository, container profiles, classifications, and repository preferences and editing the subject sources and name sources enumerations. While these are all configurations that can easily be set up using the ArchivesSpace staff interface, setting up some of these basic configurations in a provisioning script makes the process of starting and using an ArchivesSpace Vagrant instance that much faster.

Now that we've written our Vagrantfile and associated provisioning scripts, the process of setting up a new ArchivesSpace instance for testing is as simple as doing the following:

Clone the archivagrant GitHub repository (if we haven't already)
Open a terminal application and change directories to the archivagrant directory
vagrant up

The first time that we issue the vagrant up command, it provisions the virtual machine using the scripts detailed above. Once the provisioning process is complete, we can point our host machine's browser to http://localhost:8080 (to access the ArchivesSpace staff interface) and any scripts we have to http://localhost:8089 (to access the ArchivesSpace backend). If we need to gain command line access to the running virtual machine (to stop or restart ArchivesSpace, install any additional packages, mess around in an Ubuntu server without worrying that we're going to break everything, etc.), we can vagrant ssh into it. The virtual machine can be suspended using a vagrant suspend command; shutdown using vagrant halt; and destroyed with vagrant destroy. If suspended or shut down, the virtual machine can be started back up again to its previous state with another vagrant up. If destroyed, a vagrant up will recreate the virtual machine from scratch, going through the entire provisioning process. For the pros and cons of each approach, check out the Vagrant teardown documentation. I use vagrant halt most of the time, but a vagrant destroy is easiest when a new version of ArchivesSpace is released or when I have messed everything up beyond salvation.

Finally, there may be times when we want to start over with a fresh ArchivesSpace database in an existing Vagrant virtual machine without going through the process of recreating the entire machine through a vagrant destroy. The script reset_archivesspace.sh can be run by doing a vagrant ssh into the guest machine and changing directories to the /vagrant directory (a shared folder setup by Vagrant that syncs the contents of the Vagrant project's directory on the host machine to the guest machine).

The script sets up a clean MySQL database and our ArchivesSpace defaults without redownloading ArchivesSpace or reprovisioning the entire machine.

It looks like there are several other ArchivesSpace users that use a Vagrant ArchivesSpace configuration that might be worth checking out if the previously described setup doesn't quite work for you or if you want to see how others are doing it. If you're using some other way to ease the pain of frequently installing ArchivesSpace test instances, let us know!

Monday, January 18, 2016

ArchivesSpace + Vagrant = Archivagrant

1 comment: