Cleaning Data to Enhance Access and Standardize User Experience, Part I: Planning and Prioritization

Hi! This is Alicia Detelich, archivist at Manuscripts and Archives, and Christy Tomecek, project archivist at the Fortunoff Video Archive for Holocaust Testimonies. We are co-leaders of the ArchivesSpace Public User Interface (PUI) implementation project’s Data Cleanup and Enhancements Workgroup. Today we’re going to share a little bit about the initial planning efforts that our group has relied upon to guide us through this project.

Our five-member group has a variety of tasks to accomplish before the PUI “goes live” early next year, including:

  • Reviewing data across Yale’s 14 ArchivesSpace repositories
  • Determining the extent of cleanup and normalization required, and the approach(es) to making changes
  • Working with repository liaisons to ready data for ArchivesSpace publication
  • Creating, testing, and executing cleanup and normalization scripts
  • Performing quality control on updated data

Data cleanup and normalization is a full time job for some archivists, and we all have other responsibilities. Because of our time constraints and the enormous quantities of metadata produced by Yale’s special collections repositories, we had to start by doing some hard thinking about which data issues would have the greatest impact on the security of our records and on the experience of our users, and to limit our work to just those areas. Keeping our expectations realistic and avoiding overcommitting has been an important part of this process, and has almost certainly kept us from going insane (for the most part)

During an early brainstorming session in which we identified a laundry list of potential data issues to address, we debated which of these issues would be “show-stoppers,” which ones should be prioritized in order to enhance access, and which we could afford to deal with at a later date. By the end of the meeting, we had settled on the following areas of focus:

Publication status

In our current system, YFAD, finding aids are published once a week after the documents are proofread for content and for correct EAD encoding. This process also includes using an XSLT transformation which, among other things, has certain stopgaps in place for suppressing data that may have been accidentally published or is restricted from researchers while still necessary for staff. In the ArchivesSpace PUI, our finding aids will now be published instantly and will not have these XSLT suppressions. This will require us to review our current finding aid data to ensure that nothing that is confidential is accidentally made available, such as student or patient names. Repositories may also wish to unpublish records for collections that are still in process and that they cannot make available to researchers. To that end, we will work with representatives of each repository to identify any necessary changes to the current publication status of all resources, archival objects, accessions, and notes, and make these changes prior to the official launch date.

Date normalization

The ArchivesSpace PUI is more dynamic than our current system, and provides many more opportunities for filtering and faceting data. Because of this, it is much more important for us to have structured data that can be read and manipulated by machines. For instance, the PUI allows users to search for materials created during a given date range. This functionality requires that dates entered into ArchivesSpace be machine-readable. During previous migrations, date information was often added to the expression field, but not parsed into ArchivesSpace’s machine-readable beginning and end dates. Additionally, practices for formulating dates have varied widely among repositories – there are almost too many variations to count. While we may not be able to fix every single date issue, the more we can accomplish before launch, the more effective the PUI will be for our users.

How many ways can you say ‘undated’?

Machine-actionable restrictions

Our conditions governing use and conditions governing access data are also top candidates for normalization. Since 2015, it has been possible for repositories to add structured restriction information, such as end dates or condition notes, to our resource and archival object records. Aeon is able to act upon these restrictions, letting users and staff know if an item is restricted, and allowing for appeal by researchers and review by repositories.

A great deal of work has already been done to add machine-actionable dates and local access restriction types to the records of some repositories, but there are still a number of resources and archival objects which could benefit. The impending integration of Aeon with ArchivesSpace, and the potential impact of this functionality on staff and on users indicated to us that it should be one of our top priorities for clean-up and enhancement.

Note Labels

Yale’s special collections repositories have traditionally operated independently of one another, and so over time developed different policies and systems for doing descriptive work. This has resulted in a wide array of standards, jargon, and even grammar choices in our finding aids. One example is the all kinds of variability in the way that descriptive notes are labeled. While this may not seem all that consequential, labels can affect a user’s experience quite drastically. It might not be clear that a “Summary” or “Description of the Papers” is the same as a “Scope and Content” note. It is important that the same type of note be called the same thing, no matter which repository a user is searching.

In our current YFAD setup, the display of labels is suppressed by the above-mentioned XSLT, but this will no longer be the case once the PUI is implemented. This necessitates a thorough evaluation of our note label usage, and an eventual policy decision about how notes should be labeled – across all repositories – going forward.


Our repositories have long been adding URLs to note fields or digital object records. Unfortunately, we’ve been doing this so long that some of these links are likely broken. Directing users to a 404 page is never ideal, so we took this project as an opportunity to review our links to determine how many are broken. Though we aren’t necessarily testing the accuracy of the links – whether they direct the user to the intended web page – we just want to know (for now) if the links actually work.


With the exception of Manuscripts and Archives, most repositories at Yale are still using two systems to manage their descriptive and collection control metadata. YFAD and Aeon pull in descriptive information from ArchivesSpace, and container and location data comes from our ILS, Voyager.

Integration with Aeon is a major part of the PUI implementation project, and having complete and accurate container data in ArchivesSpace is a necessary part of the integration work. In order to ensure the accuracy of our top container data, we will need to compare what is in ArchivesSpace with what is in Voyager. Any discrepancies, particularly where there is data in Voyager but not in ArchivesSpace, will need to be resolved in conjunction with the repositories which created the data.

Shared records and controlled value lists

One interesting side effect of importing EAD into ArchivesSpace is that any controlled values that are present in the EAD file can be added to the enumeration values list in ArchivesSpace. This has left us with some very messy controlled value data. For instance:


Normalizing these shared lists – most importantly those related to container and extent types – will present users with a more unified experience, and facilitate robust searching and faceting in the PUI. Removing duplicative or erroneous values will also help prevent messy data issues from recurring in the future.

The (nearly completed!) project to clean up our name and subject records that Alison mentioned in her last post also dovetails nicely with this work. The addition of Library of Congress URIs, and the eventual removal of duplicate records will greatly enhance the functionality of the PUI, and will remove duplicative or incorrect values that may confuse users.

All this in six months? No problem!

Current mood.

Since deciding on the scope of the work, we’ve undertaken the fun and slightly intense task of auditing our data in each of these areas. We’ll be back in another post to talk a bit about our data auditing process and reveal some of our most interesting results. For now though, we’re excited to do our part to help make the ArchivesSpace PUI more useful for staff and researchers.

Movin’ Around with Reorder Mode in Archives Space 2.2.0

Yale will be soon upgrading PROD to version 2.2.0. This is quite exciting as this is the foundation for our migration from YFAD to the Public User Interface. It also has some updates on commonly used features. I wanted to talk about one in particular—Reorder Mode. If you want to get a head start on learning how this feature works in version 2.2.0, you can try it out in our TEST instance of ArchivesSpace, which has already been upgraded.

Currently in PROD and in older versions of ArchivesSpace, if one needs to reorder their resource record, it was done using drag-and-drop techniques in the tree. One clicked and held the record and dragged it to where they wanted it in the tree. If it’s at the same level as where it previously was, just in a different place, you would hold it between the ‘sibling’ components where you wanted it; if it’s supposed to be a child of a currently sibling component, you can drop it ‘on top’ of the sibling component. A small arrow would should where it was placed in the hierarchy, and a green check box would appear if ArchivesSpace recognized it as a legal move. Let’s look at some screenshots to make that word soup clearer.

Here is a snippet from HVT-0036’s tree, which is in its original order, from PROD.

Here’s me moving the Restoration submaster component to become the sibling between Duplicate and Restoration master. Note the small black arrow between the Duplicate and Restoration master components showing where I’m placing it and the green checkmark by the floating Restoration submaster component (Ghost component?)

And we have the sibling in place!

Now I’m moving the Restoration submaster component to become a child of the Restoration master. I have moved it so it’s on top on the Restoration master—you can see the black arrow resting on the same line of the Restoration master in the tree. Green checkmark of the ghost component shows we’re ready to go.


And we now have a child. Please note that if there were other children of the Restoration master component, it would have ended up on the top of the list if that branch of the tree was not open. If the tree was open so you could see the children, you would be able to drag between its soon-to-be siblings.

While this is not unintuitive, it also sets up users for errors. One can accidentally reorder things without trying to since the tree is always open and the ability to reorder is always ‘on.’ The precision of the dragging and dropping is also not always easy to handle. It’s not uncommon when moving a component elsewhere on the same hierarchical level to accidentally make it a child of a would-be sibling, and vice versa.

Therefore, ArchivesSpace has a new feature- Reorder Mode.

Let’s take a look at HVT-0036, this time in TEST.

The tree looks bigger and has larger markings to denote the ‘line’ of the component. Also, there is a new button, which I outlined in red, titled “Enable Reorder Mode.” If I try to move anything without that button being clicked, nothing’s going to happen with the tree, except maybe opening the component below the tree to edit. So, what do I do to move my Restoration submaster?

First, I’m clicking on the button. This turns the button green and changes the label to “Reorder Mode Active.” It also shows two new buttons: cut, which I outlined in green, and paste, which I outlined in purple. The record view below also disappears below the tree.

I can simply move the component by hovering the cursor (which has turned into a four-direction symbol) over the appropriate white box with gray dots to the far left of the component, clicking, and dragging and dropping it. After I move the component to where I want to, a new dropdown appears, asking me where exactly I want to put it. Since I want the Restoration submaster to come before the Restoration master, I’m clicking “Add Items Before.”

It’s now where I want it.


You can also move multiple components this way! If you click on the white boxes with the gray dots to the far left of the tree while holding the CTRL key on your keyboard, the boxes will turn blue, and numbers will appear to show the order it will be in when you start dragging the components. (i.e. 1 means the first component in the list, 2 is the second, etc.) This has the bonus possibility of changing the order of a level of an inventory on the fly by changing how you click the components.

So here I have selected these three components to move. I want the order to be Restoration master, Master, and Use copy.


At this point, I am dragging the components. ArchivesSpace is kind enough to list the order for me so I know how it’s being moved.

Like with a single component, it will ask me to select whether these components are to be added before a selected component (in this case, Duplicate), after the selected component, or as children of the component. I selected children and they are now children of Duplicate.

Another way to move a component, if you are concerned about not being able to drag it where you want it due to mouse issues or the like, is to click on the name of the component. ASpace will outline it in blue and a move button, which I outlined in orange, will appear.

Clicking on the Move drop down will show the choice to move the component up, down, or down into, meaning within a component as a child. Hovering the cursor over the down into option will open a new drop down to show you where you could possibly put the component.

Since I want this component to be a child of the Restoration master component, I will select that option. It then moves where I want it to be.

Please note that the move button does not work with moving multiple components. If you want to move several components at the same time, you’ll have to try another method.

A final way to move a component, which can be incredibly handy for extremely long inventories, is using cut and paste. This is best done when you want a component to become a child of another component rather than a sibling elsewhere on the list. There is a workaround to use cut and paste for creating siblings, although it’s not ideal. But I will cover that too.

Let’s say I want to make the Duplicate a child of the Master component. First, I select it so it’s outlined in blue. Then I will click Cut. The Cut button will turn gray and the component bar will turn a darker shade of gray.

Then I select the component that will become the new parent, Master. IMPORTANT: you need to click directly on the link, i.e. on “Master.” Simply clicking on the box will not work! The Master component will turn blue.

Click Paste. It will turn gray and the screen will display a loading wheel. Then the child will appear where you selected it, and the Paste button will gray out.

Cut and paste also works for multiple components—again, you select all the components you need to move while holding the CTRL key before clicking cut and going to the new parent. But what if you have an extremely long inventory and you want to make the component a sibling?

Follow the directions for cutting and pasting. When you paste, however, you want to make this component a child of the component directly above or below where you ultimately want to move the original component.

In this case, I wanted Digital copy to be a sibling between Licensing copy and Master. So now that I made Digital copy a child of Master, I can drag and drop it above Master.

Final note/warning: while it is tempting to just go, “hey, let me use the move function,” using it to tell it to go up a level will move the component to the top of the inventory. This is likely what you don’t want to do.

While this is a bit of a change, it does allow for more reliability of reordering and I hope this tutorial is helpful for you! For those who prefer watching this in action, I will be posting a screencast next week explaining all of these methods.