|Git is not the same as GitHub.  Also, I also just learned that "git" is English slang for "unpleasant person."|
|GitHub is not the same as Git.  It turns out that GitHub is not a center for unpleasant people.|
The Problem: Version Control
so, it sounds like anonymous red lion fish's thing got added to real_masters_all.
that's probably my fault. if there are any big mistakes anonymous red lionfish can just fix those, maybe using a backup
has anonymous great white shark replaced the ead masters yet?
umm, yeah i dunno
because anonymous red lionfish was working form a copy anonymous red lionfish had made
anonymous great white shark has not replaced ead masters yet but anonymous goldband fusilier and i have probably made our own changes already
but maybe anonymous red lionfish could take a copy of just the things in a csv
and we could fold those back into the real masters. hopefully there won't be too much that needs to be fixed.
The problem was that there were too many people trying to do too many things at once to the same version (or two, or three) of our EADs; the problem was version control!
Even though, as I mentioned, we had been using GitHub for quite sometime to showcase and share our custom ArchivesSpace EAD Importer and the tools we've developed to clean or prep our legacy EAD and MARC XML for migration to ArchivesSpace, as well as to make changes to the Archivematica documentation (yes, I'm rather proud of this and this contribution--thanks again for showing us the ropes, Justin and Sarah!), we hadn't been using Git and GitHub the way they were intended to be used: to solve the problem of version control when working in teams whose members may or may not be working right next to each other everyday (or in our case, even on the same computers everyday).
After some discussion about the suitability of GitHub for this project (while we know a number of libraries and archives use GitHub for a variety of purposes, we're still not sure if there is any precedence for putting EADs on GitHub--maybe we're the first!), we decided to move forward with creating a "repo" for our working copy of the EADs. To fit in with the A-Team theme, we went with the name vandura, after the model of the GMC van used in the show.
We even figured out how to add a picture to our README file in Markdown:
We decided to retain the "Real_Masters_all" directory name (because that is so different from "Real_Masters" and "FindingAids/EAD/Master"--all actual directory names!) for our EADs to serve as a reminder of those dark times, in the not too distant past, when things seemed simple, and when we just made changes to our version of record as we pleased, without thought to the hard work of our colleagues that we may or may not have been overwriting (because hey, we'll never know, and there would be no way to prove it anyway!).
Wait, I've Heard of GitHub...What's Git?
Before we go on...
If you're like me (an archivist, not a programmer!) you may or may not have known that Git and GitHub are actually two different things. Git is a distributed version control system (that is, it does not work like a shared network drive does--neither copy of a project directory is any better or more 'authoritative' than any other, and team members collaborate on identical copies). GitHub is a web-based Git repository hosting service (which is why it is so popular with open source software like Archivematica and ArchivesSpace), which also offers it's own features (like forks and pull requests). Git is a tool that you mostly use in the terminal on your local computer, while GitHub is a service that you mostly use with a graphical user interface on the Internet.
Why Use Git and/or GitHub?
So Git is a version control system, and GitHub is used in conjunction with it for work in teams. Why use them?
- Git and GitHub are not just for software, or for people with l337 h4x0r s|<1llz. In fact, both of these work extremely well for anything that is primarily text, whether that is your EADs in XML, your catalog records in MARC, your website in HTML or even your blog written in Markdown.
- All the cool kids are doing it. Whether it's companies like Artefactual Systems, Inc. (Archivematica) or Lyrasis (ArchivesSpace), or any of the institutions on this list, GitHub has become the place that open source software is shared with others.
- It's better than regular old backups. With Git, you make what are called "commits" (more on that later) with meaningful messages (e.g., "correcting spelling mistakes" or "changing id attribute to authfilenumber"). You can then go back and look at all of your commits, remember why you made a particular change you made, and even revert back to a version of a project before a particular commit. All of that is much more useful when looking back on the work you've done than seeing a backup of your project made at an arbitrary time by a computer.
- It is distributed. Everything is local. See comment above about difference between this process and using a shared network drive.
- Interns have a place where they can point to the work they've done. With GitHub, since interns have their own accounts and since there is an online, public record of every change they have ever made, interns can point to a place online where they can showcase their work for potential employers.
- You don't have to be at the Bentley or using any particular computer to do some work. That's handy.
- Everything that happens gets recorded. Check this out. That's right, all 418 changes we've made in the 27 days we've used Git and GitHub for our EADs. It's like an audit trail. And you know we digital preservation types like our audit trails.
- Management of the whole process is much easier. While there are many hands working on the same set of files, only a few hands get to accept and merge what are called "pull requests" (again, more on that later) into the Bentley's repository.
- GitHub will tell you when you're going to overwrite someone else's work! That's probably my favorite benefit. While this doesn't make the process of figuring out what to do about conflicts any easier, at least we know about them!
Convinced? I am.
And the How: How We're Using Git and GitHub for Curation Workflows
While we haven't even begun to scratch the surface of all the different operations you could do with Git and GitHub, here's the handful that we've found helpful so far, broken down into three stages: 1) the initial, project and daily setup; 2) the process for making changes; and 3) and the process for merging those changes with the Bentley's version.
|Say what you want about my handwriting, but I think that's a pretty good rendering of a laptop, if I do say so myself.|
The Setup (with Git and GitHub)
Once Per Lifetime
If you haven't already, join GitHub. The instructions are here. If you're using Windows like us you'll also need to download and install the latest version of GitHub for Windows.
Once Per Project
Fork the vandura (or any other) repository to your account online. This basically means make a copy of the repository on your account. Note that "repo," which you'll hear people say sometimes, is short for "repository" and is just a fancy word for folder with files or other folders in it, or a project directory. On GitHub, you can do this by navigating to the repository you want to fork and clicking Fork in the top-right corner of the page.
Create a local clone of your fork on your computer. In other words, make a copy of the repository on your local computer. You can do this by navigating to your fork of the repository on GitHub and copying the HTTPS clone URL in the right sidebar to your clipboard. Then, open the Git Shell application and type:
Next you'll need to configure a remote for your fork (so it knows where it came from). Move into the project directory by typing:
Once (or Twice...) Per Shift (with Pictures!)
It starts with syncing your fork, ensuring that what you have on your local computer matches what the Bentley has online (which may have been updated since you last sat down to do some work). After ensuring that you're in the appropriate directory, you do this by...
|Using git merge upstream/master to merge the changes from upstream/master into your local master branch.|
Or, if changes were made to the upstream repository while you were making changes to your fork, you can apply those changes to your local version before applying your changes by...
Making Changes (with Git)
Sometimes we make small changes (such as correcting spelling mistakes, or adding or deleting boxes from a boxlist, &c., all of which happen to a single XML file). After making changes to a single we snapshot that file by...
|Using git add [filename] to snapshot a single file in preparation for versioning.|
|Using git commit -m "[meaningful message]" to record file snapshots permanently in your version history.|
Note: These steps for making changes can be repeated ad nauseam. You make commits as often as you think you make a meaningful change (that you may want to go back to later). Also, those messages are important! "updates" is not nearly as helpful as "separated boxes for use with aeon".
The Finish (with GitHub)
For a Team Member
Upload all local commits to your account on GitHub in order to be able to merge them with the Bentley's account by...
|Using git push to "push" those commits to your online account.|
Finally, merge your account's version with the Bentley's version online by...
|Making a pull request using GitHub.|
For the Team
|Comparing the changes that need to be made. This is incredibly helpful.|
Based on that comparison, they either accept the changes or, if there is some sort of conflict, give him instructions (again, all online out in the open) to, for example, rebase to get the latest version of the EADs before making his pull request, and then accept...
|Devon's changes have been merged with the Bentley's account. Notice that we're told that the latest change was Dallas merging Devon's pull request, and his meaningful commit message is shown next to the Real_Masters_all folder.|
Kapow! Version controlled.
So Far, So Good
All that being said, we've experienced a few hiccups along the way and we're still working out our Git-flow. We'd love to hear what you're doing for version control or your experience with Git and/or GitHub. Let us know by leaving a comment or getting in touch via email or Twitter!
 "Git-logo" by Jason Long - http://git-scm.com/downloads/logos. Licensed under CC BY 3.0 via Wikimedia Commons - https://commons.wikimedia.org/wiki/File:Git-logo.svg#/media/File:Git-logo.svg
 "GitHub logo 2013" by GitHub - https://github.com/logos. Licensed under Public Domain via Wikimedia Commons - https://commons.wikimedia.org/wiki/File:GitHub_logo_2013.svg#/media/File:GitHub_logo_2013.svg
 "Puffer Fish DSC01257" by Brocken Inaglory - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons - https://commons.wikimedia.org/wiki/File:Puffer_Fish_DSC01257.JPG#/media/File:Puffer_Fish_DSC01257.JPG
 "Ocypode quadrata (Martinique)" by Free On Line Photos. Licensed under No restrictions via Wikimedia Commons - https://commons.wikimedia.org/wiki/File:Ocypode_quadrata_(Martinique).jpg#/media/File:Ocypode_quadrata_(Martinique).jpg
 Since these screenshots were done as Devon worked, they sometimes get a bit out of order...