Monday, January 18, 2016

ArchivesSpace + Vagrant = Archivagrant

If you're anything like me, you find yourself regularly creating, destroying, and recreating ArchivesSpace instances to run test migrations with a slightly modified set of data, to test new or updated plugins, or to verify that everything that previously worked still works with a new version of ArchivesSpace. Manually downloading an ArchivesSpace release, setting up a MySQL database, editing or copying over a config file, setting up default values, and all of the other steps that go into getting an ArchivesSpace release ready for testing can be a time consuming process. If you're even more like me, you went through this process countless times, all the while thinking "there must be a better way" without realizing the entire time that there is, indeed, a better way: Vagrant.

Vagrant is an application that allows users to create a single configuration file (aka a Vagrantfile) that can be used, shared, and reused to "create and configure lightweight, reproducible, and portable development environments" in the form of virtual machines. Vagrant, and tools like it, have been widely used by developers to solve issues that arise when development on a single application is done in a variety of development environments and to make the process of configuring a development environment across time, users, and operating systems easier and more consistent. Vagrant allows users to do some upfront work to configure an environment so that other people working on a project (and their future selves) will not have to worry about going through manual, time-consuming, and error-prone configuration steps ever again. While we aren't doing a lot of heavy developing, repurposing an existing tool like Vagrant to cut down on the amount of time we spend unzipping directories, editing config files, installing plugins, and so on allows us to focus on the work that we really need to do.

This blog post will walk through our ArchivesSpace Vagrant (or, Archivagrant) project to demonstrate how to setup a Vagrantfile and related provisioning scripts that, once written, will download the latest ArchivesSpace release, install dependencies and user plugins, setup ArchivesSpace to run against a MySQL database, and setup some of our local ArchivesSpace default values.

In order to beginning using Vagrant, you will need to first do the following:

  1. Install VirtualBox
  2. Install Vagrant
  3. Install a terminal application with ssh (Secure Shell). If you're on Mac or Linux, the default terminal should work. If you're on Windows and have git installed, the git shell should also work. Another option is to use a Unix-like terminal emulator such as Cygwin, being sure to install the ssh package during setup and installation.

If you've never used Vagrant before and are curious about how it works in general, follow along with the Vagrant Getting Started instructions for information about downloading and installing boxes, setting up a basic Vagrantfile, and provisioning, starting, and accessing a Vagrant virtual machine before continuing with the following instructions.

As detailed in the Vagrant setup instructions, a Vagrantfile is all that's needed to set up a Vagrant virtual machine that can be installed, started, stopped, destroyed, and recreated at any time and on any machine. Here's ours:

# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure(2) do |config|
config.vm.box = "hashicorp/precise32"
config.vm.network "forwarded_port", guest: 8080, host: 8080
config.vm.network "forwarded_port", guest: 8089, host: 8089
config.vm.provider "virtualbox" do |vb|
vb.memory = "2048"
end
config.vm.provision "shell", path:"setup_python.sh"
config.vm.provision "shell", path:"setup_mysql.sh"
config.vm.provision "shell", path:"setup_archivesspace.sh"
end
view raw Vagrantfile hosted with ❤ by GitHub


The Vagrantfile for this project is pretty simple. First, it indicates that the box to be used is hashicorp/precise32, which is an Ubuntu 12.04 LTS 32-bit box. Next, ports 8080 and 8089 from the guest virtual machine are forwarded to the same ports on the host machine. This will allow us to use the browser on our host machine (the computer running Vagrant) to access the ArchivesSpace application running inside of the virtual machine and interact with the ArchivesSpace application as if it were running on our actual machine using its default backend and staff interface ports. That way, we don't need to worry about finding the IP address of the Vagrant machine or remembering any non-default ports (it also means that we don't need to change anything in the many scripts we have that access the ArchivesSpace API using http://localhost:8089). Next, the Vagrantfile allocates 2 GB of RAM to the virtual machine to improve performance. Finally, the Vagrantfile provisions the virtual machine using three shell scripts: setup_python.sh, setup_mysql.sh, and setup_archivesspace.sh.

#!/usr/bin/env bash
# Update packages
apt-get -y update
# Install pip
apt-get -y install python-pip
# Upgrade it
pip install --upgrade pip
# Set the system to use the upgraded version
rm /usr/bin/pip
ln -s /usr/local/bin/pip /usr/bin/pip
# Install the Python requests library
pip install requests
view raw setup_python.sh hosted with ❤ by GitHub


The first shell script, setup_python.sh, is fairly short and simple. It first updates Ubuntu's packages (to ensure that we'll be downloading the most up-to-date packages in this and any subsequent provisioning scripts), then installs the Python package manager pip using the Ubuntu package manager, upgrades pip to its latest version, and installs the Python Requests library, which we'll be using later to find and download the latest version of ArchivesSpace and configure our ArchivesSpace defaults.

#!/usr/bin/env bash
# Export MySQL root password before installing to prevent MySQL from prompting for a password during the installation process
# http://stackoverflow.com/questions/18812293/vagrant-ssh-provisioning-mysql-password
echo "mysql-server mysql-server/root_password password rootpwd" | debconf-set-selections
echo "mysql-server mysql-server/root_password_again password rootpwd" | debconf-set-selections
# Install mysql-server and create the archivesspace database
# http://archivesspace.github.io/archivesspace/user/running-archivesspace-against-mysql/
# https://gist.github.com/rrosiek/8190550
apt-get -y install mysql-server
mysql -uroot -prootpwd -e "create database archivesspace"
mysql -uroot -prootpwd -e "grant all on archivesspace.* to 'as'@'localhost' identified by 'as123'"
view raw setup_mysql.sh hosted with ❤ by GitHub


The next shell script, setup_mysql.sh, installs the Ubuntu mysql-server package, sets up a root username and password (since this is a temporary virtual machine used only for testing purposes, it's okay if the username and password are weak and exposed), and finally creates and configures a database following the official ArchivesSpace documentation for running ArchivesSpace against MySQL.

The final provisioning script, setup_archivesspace.sh, is the most detailed. It also makes use of two separate Python scripts that do the bulk of the work, so for the purposes of this post we'll take a look at setup_archivesspace.sh in two parts. It's worth reiterating that this provisioning script configures ArchivesSpace for our needs here at the Bentley Historical Library, but you should be able to modify it to suit your needs by changing some of the variables and removing some of the plugins (or adding your own).

#!/usr/bin/env bash
echo "Installing dependencies"
apt-get -y install default-jre
apt-get -y install unzip
apt-get -y install git
cd /vagrant
echo "Downloading latest ArchivesSpace release"
# Use a Python script to download the latest ArchivesSpace release, because this is the only way that I know how
python download_latest_archivesspace.py


The first part of the setup_archivesspace.sh shell script is pretty straightforward. The script first installs the Ubuntu packages that will be used in provisioning ArchivesSpace: Java (required by ArchivesSpace), unzip (used to extract the downloaded ArchivesSpace release), and git (used to install plugins from the Bentley's GitHub repository). Then, the shell script calls a separate Python script, download_latest_archivesspace.py, which is used to locate and download the latest release of ArchivesSpace.

import requests
import json
import os
from os.path import join
latest_release_api = 'https://api.github.com/repos/archivesspace/archivesspace/releases/latest'
save_dir = '/home/vagrant'
if not os.path.exists(save_dir):
os.makedirs(save_dir)
def extract_release(zip_file):
# Use os.system instead of the python zipfile library to preserve permissions
os.system('unzip ' + zip_file + ' -d /home/vagrant/')
with requests.Session() as s:
print "Finding the latest ArchivesSpace release"
latest_release_json = requests.get(latest_release_api).json()
latest_release_name = latest_release_json['assets'][0]['name']
latest_release_url = latest_release_json['assets'][0]['browser_download_url']
print "Latest release url:",latest_release_url
print "Latest release name:",latest_release_name
zip_file = join(save_dir,latest_release_name)
unzipped_file = join(save_dir,'archivesspace')
if not os.path.exists(zip_file) and not os.path.exists(unzipped_file):
print "Downloading latest release"
latest_release_zip = s.get(latest_release_url)
with open(zip_file,'wb') as outfile:
print "Saving latest release to {0}".format(zip_file)
outfile.write(latest_release_zip.content)
print "Extracting latest release"
extract_release(zip_file)
elif os.path.exists(zip_file) and not os.path.exists(unzipped_file):
print "Latest release downloaded but not extracted"
print "Extracting..."
extract_release(zip_file)
else:
print "Latest release already downloaded and extracted"


This Python script uses the Python Requests library and the GitHub API to find the URL for the latest official ArchivesSpace release, download it, and extract it to the guest machine's home directory.

# These will be used to edit the ArchivesSpace config file to use the correct database URL and setup our plugins
DBURL='AppConfig[:db_url] = "jdbc:mysql://localhost:3306/archivesspace?user=as\&password=as123\&useUnicode=true\&characterEncoding=UTF-8"'
PLUGINS="AppConfig[:plugins] = ['bhl-ead-importer','bhl-ead-exporter','container_management','aspace-jsonmodel-from-format']"
echo "Installing plugins"
cd /home/vagrant
echo "Installing container management"
# Grab a release instead of cloning the repo to make sure it's a version compatible with latest ArchivesSpace releases
wget https://github.com/hudmol/container_management/releases/download/1.1/container_management-1.1.zip
unzip container_management-1.1.zip -d /home/vagrant/archivesspace/plugins
echo "Installing BHL EAD Importer and Exporter"
cd archivesspace/plugins
git clone https://github.com/bentley-historical-library/bhl-ead-importer.git
git clone https://github.com/bentley-historical-library/bhl-ead-exporter.git
echo "Installing Mark Cooper's JSONModel from Format plugin"
git clone https://github.com/bentley-historical-library/aspace-jsonmodel-from-format.git
echo "Installing mysql java connector"
# http://archivesspace.github.io/archivesspace/user/running-archivesspace-against-mysql/
cd /home/vagrant/archivesspace/lib
wget http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.37/mysql-connector-java-5.1.37.jar
echo "Editing config"
cd /home/vagrant/archivesspace/config
# Edit the config file to use the MySQL database and setup our plugins
# http://stackoverflow.com/questions/14643531/changing-contents-of-a-file-through-shell-script
sed -i "s@#AppConfig\[:db_url\].*@$DBURL@" config.rb
sed -i "s@#AppConfig\[:plugins\].*@$PLUGINS@" config.rb
echo "Setting up database and starting ArchivesSpace"
# First, make the setup-database.sh and archivesspace.sh scripts executable
cd /home/vagrant/archivesspace/scripts
chmod +x setup-database.sh
cd /home/vagrant/archivesspace
chmod +x archivesspace.sh
echo "Setting up database"
scripts/setup-database.sh
echo "Adding ArchivesSpace to system startup"
cd /etc/init.d
ln -s /home/vagrant/archivesspace/archivesspace.sh archivesspace
update-rc.d archivesspace defaults
update-rc.d archivesspace enable
cd /home/vagrant/archivesspace
echo "Starting ArchivesSpace"
./archivesspace.sh start
echo "Setting up ArchivesSpace defaults"
cd /vagrant
python archivesspace_defaults.py
echo "All done!"
echo "Point your host machine's browser to http://localhost:8080 to begin using ArchivesSpace"
echo "Use vagrant ssh to access the virtual machine"


After downloading and unzipping the latest version of ArchivesSpace, the setup_archivesspace.sh provisioning script sets variables for the database URL and plugins entries to be edited in the ArchivesSpace config file. Then, several plugins are downloaded to the ArchivesSpace plugins directory, including the latest version of the container management plugin, our own EAD importer and exporter plugins, and our slightly modified version of Mark Cooper's very handy aspace-jsonmodel-from-format plugin (used to convert our legacy EADs to ArchivesSpace JSONModel format before posting them via the API -- we'll blog about that at some point, but it makes error identification much easier). Next, the setup_archivesspace.sh script edits the ArchivesSpace config file, replacing the default database URL and plugins entries with the variables that we set up earlier. The script continues by running the ArchivesSpace setup-database.sh script, then configures ArchivesSpace to run at system start (so we won't have to access the virtual machine just to start the application), and starts ArchivesSpace. Finally, the provisioning script calls another Python script, archivesspace_defaults.py, to set up some of our default configurations.

import requests
import time
import json
# This script is used to setup BHL ArchivesSpace defaults for running test migrations
def test_connection():
try:
requests.get('http://localhost:8089')
print 'Connected!'
return True
except requests.exceptions.ConnectionError:
print 'Connection error. Trying again in 10 seconds.'
is_connected = test_connection()
while not is_connected:
time.sleep(10)
is_connected = test_connection()
auth = requests.post('http://localhost:8089/users/admin/login?password=admin').json()
session = auth['session']
headers = {'X-ArchivesSpace-Session':session}
bhl_repo = {
'name':'Bentley Historical Library',
'org_code':'MiU-H',
'repo_code':'BHL',
'parent_institution_name':'University of Michigan'
}
post_repo = requests.post('http://localhost:8089/repositories',headers=headers,data=json.dumps(bhl_repo)).json()
print post_repo
base_profile = {
'name':'',
'extent_dimension':'height',
'dimension_units':'inches',
'height':'0',
'width':'0',
'depth':'0'
}
profile_names = ['box','folder','volume','reel','map-case','panel','sound-disc','tube','item','object','bundle']
for profile_name in profile_names:
container_profile = base_profile
container_profile['name'] = profile_name
profile_post = requests.post('http://localhost:8089/container_profiles',headers=headers,data=json.dumps(container_profile)).json()
print profile_post
mhc_classification = {'title':'Michigan Historical Collections','identifier':'MHC'}
uarp_classification = {'title':'University Archives and Records Program','identifier':'UARP'}
for classification in [mhc_classification, uarp_classification]:
classification_post = requests.post('http://localhost:8089/repositories/2/classifications',headers=headers,data=json.dumps(classification)).json()
print classification_post
subject_sources = requests.get('http://localhost:8089/config/enumerations/23',headers=headers).json()
subject_sources['values'].extend(['lcnaf','lctgm','aacr2'])
update_subject_sources = requests.post('http://localhost:8089/config/enumerations/23',headers=headers,data=json.dumps(subject_sources)).json()
print update_subject_sources
name_sources = requests.get('http://localhost:8089/config/enumerations/4',headers=headers).json()
name_sources['values'].append('lcnaf')
update_name_sources = requests.post('http://localhost:8089/config/enumerations/4',headers=headers,data=json.dumps(name_sources)).json()
print update_name_sources
repo_preferences = {
'repository':{'ref':'/repositories/2'},
'defaults':{'publish':True}
}
repo_preferences_post = requests.post('http://localhost:8089/repositories/2/preferences',headers=headers, data=json.dumps(repo_preferences)).json()
print repo_preferences_post


This script uses the ArchivesSpace API to setup some of the default values that we've been using for testing, including setting up a repository, container profiles, classifications, and repository preferences and editing the subject sources and name sources enumerations. While these are all configurations that can easily be set up using the ArchivesSpace staff interface, setting up some of these basic configurations in a provisioning script makes the process of starting and using an ArchivesSpace Vagrant instance that much faster.

Now that we've written our Vagrantfile and associated provisioning scripts, the process of setting up a new ArchivesSpace instance for testing is as simple as doing the following:

  1. Clone the archivagrant GitHub repository (if we haven't already)
  2. Open a terminal application and change directories to the archivagrant directory
  3. vagrant up
The first time that we issue the vagrant up command, it provisions the virtual machine using the scripts detailed above. Once the provisioning process is complete, we can point our host machine's browser to http://localhost:8080 (to access the ArchivesSpace staff interface) and any scripts we have to http://localhost:8089 (to access the ArchivesSpace backend). If we need to gain command line access to the running virtual machine (to stop or restart ArchivesSpace, install any additional packages, mess around in an Ubuntu server without worrying that we're going to break everything, etc.), we can vagrant ssh into it. The virtual machine can be suspended using a vagrant suspend command; shutdown using vagrant halt; and destroyed with vagrant destroy. If suspended or shut down, the virtual machine can be started back up again to its previous state with another vagrant up. If destroyed, a vagrant up will recreate the virtual machine from scratch, going through the entire provisioning process. For the pros and cons of each approach, check out the Vagrant teardown documentation. I use vagrant halt most of the time, but a vagrant destroy is easiest when a new version of ArchivesSpace is released or when I have messed everything up beyond salvation.


Finally, there may be times when we want to start over with a fresh ArchivesSpace database in an existing Vagrant virtual machine without going through the process of recreating the entire machine through a vagrant destroy. The script reset_archivesspace.sh can be run by doing a vagrant ssh into the guest machine and changing directories to the /vagrant directory (a shared folder setup by Vagrant that syncs the contents of the Vagrant project's directory on the host machine to the guest machine).

#!/usr/bin/env bash
echo "Stopping ArchivesSpace"
service archivesspace stop
echo "Dropping and recreating database"
mysql -uroot -prootpwd -e "drop database archivesspace"
mysql -uroot -prootpwd -e "create database archivesspace"
mysql -uroot -prootpwd -e "grant all on archivesspace.* to 'as'@'localhost' identified by 'as123'"
echo "Deleting indexer state"
cd /home/vagrant/archivesspace/data
rm -rf indexer_state
rm -rf solr_backups
rm -rf solr_index
echo "Setting up database"
cd /home/vagrant/archivesspace
scripts/setup-database.sh
echo "Starting ArchivesSpace"
service archivesspace start
echo "Applying ArchivesSpace defaults"
cd /vagrant
python archivesspace_defaults.py


The script sets up a clean MySQL database and our ArchivesSpace defaults without redownloading ArchivesSpace or reprovisioning the entire machine.

It looks like there are several other ArchivesSpace users that use a Vagrant ArchivesSpace configuration that might be worth checking out if the previously described setup doesn't quite work for you or if you want to see how others are doing it. If you're using some other way to ease the pain of frequently installing ArchivesSpace test instances, let us know!





1 comment: