In my last blog post, I talked a bit about why ArchivesSpace is so central and essential to all of the work that we do. And part of the reason why we’re being so careful with our migration work is not just because our data is important, but also because there’s a LOT of it. Just at Manuscripts and Archives (the department in which I work at Yale, which is one of many archival repositories on campus), we have more than 122,000 distinct containers in Archivists’ Toolkit.
With this scale of materials, we need efficient, reliable ways of keeping physical control of the materials in our care. After all, if a patron decides that she wants to see material in one specific container among the more than 122 thousand, we have to know where it is, what it is, and what its larger context is in our collections.
Many years ago, when Manuscripts and Archives adopted Archivists’ Toolkit (AT) as our archival management system, we developed a set of ancillary plug-ins to help with container management. Many of these plug-ins became widely adopted in the greater archival community. I’d encourage anyone interested in this functionality to read this blog post, written at the time of development, as well as other posts on the AT@Yale blog (some things about the plug-in look marginally different today, but the functions are more or less the same).
In short, our AT plug-in did two major things.
- It lets us manage the duplication of information between AT and our ILS
At Yale, we create a finding aid and a MARC-encoded record for each collection*. In the ILS, we also create “item records” for each container in our collection. That container has an associated barcode, information about container type, and information about restrictions associated with that container.
All of this information needs to be exactly the same across both systems, and should be created in Archivists’ Toolkit and serialized elsewhere. Part of our development work was simply to add fields so that we could keep track of the record identifier in our ILS that corresponds to the information in AT. - It let us assign information that was pertinent to a single container all at once (and just once).
In Archivists’ Toolkit (and in ArchivesSpace too, currently), the container is not modeled. By this I mean that if, within a collection, you assign box 8 to one component and also box 8 to another component, the database has not declared in a rigorous way that the value of “8” refers to the same thing. Adding the same barcode (or any other information about a container) to every component in box 8 introduces huge opportunities for user error. Our plug-in for Archivists’ Toolkit did some smart filtering to create a group of components that have been assigned box 8 (they’re all in the same collection, and in the same series too, since some repositories re-number boxes starting with 1 at each series), and then created an interface to assign information about that container just once. Then, in the background, the plug-in duplicated that information for each component that was called box 6.
This wasn’t just about assigning barcodes and Voyager holdings IDs and BIBIDs — it also let us assign a container to a location in an easy, efficient way. But you’ll notice in my description that we haven’t really solved the problem of the database not knowing that those box 8’s are all the same thing. Instead, our program just went with the same model and did a LOT of data duplication (which you database nerds out there know is a no-no).
Unfortunately, ArchivesSpace doesn’t yet model containers, and as it is now, it’s not easy to declare facts about a container (like its barcode or its location) just once. Yale has contracted with Hudson Molonglo to take on this work. Anyone interested in learning more is welcome to view our scope of work with them, available here — the work I’m describing is task 6 in this document, and I look forward to describing the other work they will be doing in subsequent blog posts. We’ve also parsed out each of the minute actions that this function should be able to do as a set of user stories, available here. Please keep in mind that we are currently in development and some of these functions may change.
Once this work is completed, we plan to make the code available freely and with an open source license, and we also plan to make the functions available for any repository that would like to use them. Please don’t hesitate to contact our committee of you have questions about our work.
_________________________________________
* We (usually/often/sometimes depending on the repository) create an EAD-encoded finding aid for a collection at many levels of description, and also create a collection-level MARC-encoded record in a Voyager environment. This process currently involves a lot of copying and pasting, and records can sometimes get out of sync — we know that this is an issue that is pretty common in libraries, and we’re currently thinking of ways to synchronize that data.