Managing Content, Managing Containers, Managing Access

In my last blog post, I talked a bit about why ArchivesSpace is so central and essential to all of the work that we do. And part of the reason why we’re being so careful with our migration work is not just because our data is important, but also because there’s a LOT of it. Just at Manuscripts and Archives (the department in which I work at Yale, which is one of many archival repositories on campus), we have more than 122,000 distinct containers in Archivists’ Toolkit.

With this scale of materials, we need efficient, reliable ways of keeping physical control of the materials in our care. After all, if a patron decides that she wants to see material in one specific container among the more than 122 thousand, we have to know where it is, what it is, and what its larger context is in our collections.

Many years ago, when Manuscripts and Archives adopted Archivists’ Toolkit (AT) as our archival management system, we developed a set of ancillary plug-ins to help with container management. Many of these plug-ins became widely adopted in the greater archival community. I’d encourage anyone interested in this functionality to read this blog post, written at the time of development, as well as other posts on the AT@Yale blog  (some things about the plug-in look marginally different today, but the functions are more or less the same).

In short, our AT plug-in did two major things.

  1. It lets us manage the duplication of information between AT and our ILS
    At Yale, we create a finding aid and a MARC-encoded record for each collection*. In the ILS, we also create “item records” for each container in our collection. That container has an associated barcode, information about container type, and information about restrictions associated with that container.
    All of this information needs to be exactly the same across both systems, and should be created in Archivists’ Toolkit and serialized elsewhere. Part of our development work was simply to add fields so that we could keep track of the record identifier in our ILS that corresponds to the information in AT.
  2. It let us assign information that was pertinent to a single container all at once (and just once).
    In Archivists’ Toolkit (and in ArchivesSpace too, currently), the container is not modeled. By this I mean that if, within a collection, you assign box 8 to one component and also box 8 to another component, the database has not declared in a rigorous way that the value of “8” refers to the same thing. Adding the same barcode (or any other information about a container) to every component in box 8 introduces huge opportunities for user error. Our plug-in for Archivists’ Toolkit did some smart filtering to create a group of components that have been assigned box 8 (they’re all in the same collection, and in the same series too, since some repositories re-number boxes starting with 1 at each series), and then created an interface to assign information about that container just once. Then, in the background, the plug-in duplicated that information for each component that was called box 6.
    This wasn’t just about assigning barcodes and Voyager holdings IDs and BIBIDs — it also let us assign a container to a location in an easy, efficient way. But you’ll notice in my description that we haven’t really solved the problem of the database not knowing that those box 8’s are all the same thing. Instead, our program just went with the same model and did a LOT of data duplication (which you database nerds out there know is a no-no).

Unfortunately, ArchivesSpace doesn’t yet model containers, and as it is now, it’s not easy to declare facts about a container (like its barcode or its location) just once. Yale has contracted with Hudson Molonglo to take on this work. Anyone interested in learning more is welcome to view our scope of work with them, available here — the work I’m describing is task 6 in this document, and I look forward to describing the other work they will be doing in subsequent blog posts. We’ve also parsed out each of the minute actions that this function should be able to do as a set of user stories, available here. Please keep in mind that we are currently in development and some of these functions may change.

Once this work is completed, we plan to make the code available freely and with an open source license, and we also plan to make the functions available for any repository that would like to use them. Please don’t hesitate to contact our committee of you have questions about our work.


* We (usually/often/sometimes depending on the repository) create an EAD-encoded finding aid for a collection at many levels of description, and also create a collection-level MARC-encoded record in a Voyager environment. This process currently involves a lot of copying and pasting, and records can sometimes get out of sync — we know that this is an issue that is pretty common in libraries, and we’re currently thinking of ways to synchronize that data.

Making Our Tools Work for Us

Metadata creation is the most expensive thing we do.

I hear myself saying this a lot lately, mostly because it’s true. In the special collections world, everything we have is unique or very rare. And since we’re in an environment where patrons who want to use our materials can’t just browse our shelves (and since the idea of making meaning out of stuff on shelves is ludicrous!), we have to tell them what we have by creating metadata.

Creating metadata for archival objects is different than creating it for a book — a book tells you more about itself. From a book’s title page, one can discern its title, its author, who published it. Often, an author will even write an abstract of what happens in the book and someone at the Library of Congress will have done work (what we call subject analysis) to determine its aboutness.

In archives, none of that intellectual pre-processing has been done for us. Someone doing archival description has to answer a set of difficult questions in order to create high-quality description — who made this? Why was it created? What purpose did it serve for its creator? What evidence does this currently serve about what happened in the past? And the same questions have to be addressed at multiple levels — what is the meaning behind this entire collection? What does it tell us about the creator’s impact on the world? What is the meaning behind a single file collected by the creator? What purpose did it serve in her life?

Thus, the metadata we create for our materials is also unique, rare, intellectually-intensive, and essential to maintain.

Here, as part of a planning session, Mary and Melissa are talking through which tasks need to be performed in sequence.
Here, as part of a planning session, Mary and Melissa are talking through which tasks need to be performed in sequence.

Currently, we use a tool called Archivists’ Toolkit to maintain information about our holdings, and this blog is about our process of migrating to a different tool, called ArchivesSpace. Because, like I say, the data in ArchivesSpace is expensive and unique, we’ve taken a very deliberative and careful approach to planning for migration.

We’re lucky to have a strong group, with diverse backgrounds. Mary Caldera and Mark Custer are our co-chairs, and have strong management and metadata expertise between them. Melissa Wisner, our representative from Library IT, has a background in project management and business analysis. She was able to walk us through our massive project planning, and helped us understand and make sense of the many layers of dependencies and collaborations that will all have to be executed properly in order for this project to be successful. Others on the group include experts in various archival functions and standards. And beyond this, we have established a liaison system between ourselves on the committee and archivists from other repositories at Yale, so we can make sure that the wisdom of our wider community is being harnessed and the transition to this new system is successful for all of us.

Anyone interested in viewing our project timeline is welcome to see it here. We know that other repositories are also involved in transition to ArchivesSpace, and we would be happy to answer questions you may have about our particular implementation plan.

Yale Library IT Supports ArchivesSpace

The implementation of ArchivesSpace is a collaborative effort among archives and special collection units and Library IT. This project is exciting because ArchivesSpace is to special collections as Voyager is to YUL’s general collections. Library IT has a long-standing relationship with Voyager as an enterprise level application, providing server support, coordinating upgrades, managing custom development, and encouraging an ideology of systems and data integration. The implementation work of ArchivesSpace (AS) will run a similar gamut of standard IT support, including supporting three instances of AS (dev, test, production), configuring an LDAP server and Active Directory group for authentication, assisting with data analysis and export, participating in a series of sprints with a third party vendor to develop custom plug-ins for AS, and managing some in-house development to integrate AS with Voyager. While these tasks are typical of an IT project, success depends on the collaborative relationship IT develops with archival and special collections staff. Library IT is investing time to learn the current and expected workflow supported by AS, and why the tool is critical to the daily operations of archivists and special collections professionals. IT is also learning the lexicon employed by special collections (and one day I hope to fully understand what a container is), and what archivists geek out on. The technical and social elements of new systems implementations comprise the ultimate success.