Error: #<:ValidationException: {:errors=>{"extents"=>["At least 1 item(s) is required"]}}>
This is the error we got when we tried to import EADs into ArchivesSpace with extent statements that began with text, such as "ca." or "approx." So ArchivesSpace likes extent statements that begin with numbers. Fine. Easy fix. Problem solved.
And it was an easy fix... until we started getting curious.
The Extent (Get It!) of the Problem
As we did our original tests importing legacy EADs into ArchivesSpace (thanks, Dallas!), we started noticing that extents weren't importing quite the way we expected. As it turns out, ArchivesSpace imports the entire statement from EAD's <physdesc><extent> element as the "Whole" extent, with the first number in the statement imported as the "Number" of the extent and the remainder of the statement imported as the "Type":
An Extent in ArchivesSpace |
This results in issues such as the one above, where the number imports fine, but type imports incorrectly. "linear feet and 7.62 MB (online)" is actually a Type plus another extent statement with its own Number, Type and Container Summary. This would be more accurately represented by breaking the extent into two "Part" portions.
This also makes for a very dirty "Type" dropdown list:
I've highlighted the only type that should really be there. |
Now, this isn't actually a problem for import to ArchivesSpace. But it is a problem. In the end, we decided to take a closer look at extents to clean them up. That's fun, right? In hindsight, our initial excitement about this was probably a little naive. We were dealing with 80 years of highly varied descriptive practices, after all.
Getting Extents
In his last post, Dallas started to detail how we "get" elements from EADs ("get" here means go through our EADs, grab extent(s), and print them with their filename and location to a CSV for closer inspection and cleaning). In case you're wondering how exactly we did got extents, here is our code (and feel free to improve it!):
bentley-historical-library/migration-tools
The Long Tail(s) of Exents
Whoa. |
Whoa-ho-hoa. |
How We're Thinking About Fixing Extents (How Comes Later)
Human Readable vs. Machine-Actionable Extents
Why Are We Recording This Information Again?
The Solution
I know you'd really like to know our solution. Well, we've taken care of the easy ones:
Just cleaned the easy extents for import into @ArchivesSpace. 1503 of them in one fell swoop. Now for those errors... pic.twitter.com/aGqArWfclx
— UM BHL Curation (@UMBHLCuration) May 6, 2015
46,085 component-level extents down. 12,584 to go. Thanks to @ThePSF and @OpenRefine--@ArchivesSpace, here we come! pic.twitter.com/2ym5pzdIRa
— UM BHL Curation (@UMBHLCuration) May 8, 2015
Other than the easy ones, however, progress is slow. We're continuing to try to create user stories to inform our thinking, to create a short list of extent types, and to make plans for addressing common extent type issues.
A future post will detail some of the OpenRefine magic we're doing to clean up extents, and another will explain exactly how we're handling these issues and reintegrating them back into the original EADs, code snippets and all. Stay tuned!
In the meantime, why not leave a comment and let us know how and why you use extents!
Have you considered doing this kind of cleanup using XSLT? XSLT (especially v2) is well suited to the job.
ReplyDeleteThat's a great suggestion! And timely! Mike actually just finished up an ARL and DLF workshop entitled "Transforming Library Metadata with XSLT": http://www.arl.org/events/upcoming-events/event/133#.VV_GfPlViko
DeleteWe're eager to put what he's learned to work transforming our XML. Do you have any resources you'd recommend for those of us just getting started with XSLT?