Cleaning Data to Enhance Access and Standardize User Experience, Part I: Planning and Prioritization

Hi! This is Alicia Detelich, archivist at Manuscripts and Archives, and Christy Tomecek, project archivist at the Fortunoff Video Archive for Holocaust Testimonies. We are co-leaders of the ArchivesSpace Public User Interface (PUI) implementation project’s Data Cleanup and Enhancements Workgroup. Today we’re going to share a little bit about the initial planning efforts that our group has relied upon to guide us through this project.

Our five-member group has a variety of tasks to accomplish before the PUI “goes live” early next year, including:

  • Reviewing data across Yale’s 14 ArchivesSpace repositories
  • Determining the extent of cleanup and normalization required, and the approach(es) to making changes
  • Working with repository liaisons to ready data for ArchivesSpace publication
  • Creating, testing, and executing cleanup and normalization scripts
  • Performing quality control on updated data

Data cleanup and normalization is a full time job for some archivists, and we all have other responsibilities. Because of our time constraints and the enormous quantities of metadata produced by Yale’s special collections repositories, we had to start by doing some hard thinking about which data issues would have the greatest impact on the security of our records and on the experience of our users, and to limit our work to just those areas. Keeping our expectations realistic and avoiding overcommitting has been an important part of this process, and has almost certainly kept us from going insane (for the most part)

During an early brainstorming session in which we identified a laundry list of potential data issues to address, we debated which of these issues would be “show-stoppers,” which ones should be prioritized in order to enhance access, and which we could afford to deal with at a later date. By the end of the meeting, we had settled on the following areas of focus:

Publication status

In our current system, YFAD, finding aids are published once a week after the documents are proofread for content and for correct EAD encoding. This process also includes using an XSLT transformation which, among other things, has certain stopgaps in place for suppressing data that may have been accidentally published or is restricted from researchers while still necessary for staff. In the ArchivesSpace PUI, our finding aids will now be published instantly and will not have these XSLT suppressions. This will require us to review our current finding aid data to ensure that nothing that is confidential is accidentally made available, such as student or patient names. Repositories may also wish to unpublish records for collections that are still in process and that they cannot make available to researchers. To that end, we will work with representatives of each repository to identify any necessary changes to the current publication status of all resources, archival objects, accessions, and notes, and make these changes prior to the official launch date.

Date normalization

The ArchivesSpace PUI is more dynamic than our current system, and provides many more opportunities for filtering and faceting data. Because of this, it is much more important for us to have structured data that can be read and manipulated by machines. For instance, the PUI allows users to search for materials created during a given date range. This functionality requires that dates entered into ArchivesSpace be machine-readable. During previous migrations, date information was often added to the expression field, but not parsed into ArchivesSpace’s machine-readable beginning and end dates. Additionally, practices for formulating dates have varied widely among repositories – there are almost too many variations to count. While we may not be able to fix every single date issue, the more we can accomplish before launch, the more effective the PUI will be for our users.

How many ways can you say ‘undated’?

Machine-actionable restrictions

Our conditions governing use and conditions governing access data are also top candidates for normalization. Since 2015, it has been possible for repositories to add structured restriction information, such as end dates or condition notes, to our resource and archival object records. Aeon is able to act upon these restrictions, letting users and staff know if an item is restricted, and allowing for appeal by researchers and review by repositories.

A great deal of work has already been done to add machine-actionable dates and local access restriction types to the records of some repositories, but there are still a number of resources and archival objects which could benefit. The impending integration of Aeon with ArchivesSpace, and the potential impact of this functionality on staff and on users indicated to us that it should be one of our top priorities for clean-up and enhancement.

Note Labels

Yale’s special collections repositories have traditionally operated independently of one another, and so over time developed different policies and systems for doing descriptive work. This has resulted in a wide array of standards, jargon, and even grammar choices in our finding aids. One example is the all kinds of variability in the way that descriptive notes are labeled. While this may not seem all that consequential, labels can affect a user’s experience quite drastically. It might not be clear that a “Summary” or “Description of the Papers” is the same as a “Scope and Content” note. It is important that the same type of note be called the same thing, no matter which repository a user is searching.

In our current YFAD setup, the display of labels is suppressed by the above-mentioned XSLT, but this will no longer be the case once the PUI is implemented. This necessitates a thorough evaluation of our note label usage, and an eventual policy decision about how notes should be labeled – across all repositories – going forward.


Our repositories have long been adding URLs to note fields or digital object records. Unfortunately, we’ve been doing this so long that some of these links are likely broken. Directing users to a 404 page is never ideal, so we took this project as an opportunity to review our links to determine how many are broken. Though we aren’t necessarily testing the accuracy of the links – whether they direct the user to the intended web page – we just want to know (for now) if the links actually work.


With the exception of Manuscripts and Archives, most repositories at Yale are still using two systems to manage their descriptive and collection control metadata. YFAD and Aeon pull in descriptive information from ArchivesSpace, and container and location data comes from our ILS, Voyager.

Integration with Aeon is a major part of the PUI implementation project, and having complete and accurate container data in ArchivesSpace is a necessary part of the integration work. In order to ensure the accuracy of our top container data, we will need to compare what is in ArchivesSpace with what is in Voyager. Any discrepancies, particularly where there is data in Voyager but not in ArchivesSpace, will need to be resolved in conjunction with the repositories which created the data.

Shared records and controlled value lists

One interesting side effect of importing EAD into ArchivesSpace is that any controlled values that are present in the EAD file can be added to the enumeration values list in ArchivesSpace. This has left us with some very messy controlled value data. For instance:


Normalizing these shared lists – most importantly those related to container and extent types – will present users with a more unified experience, and facilitate robust searching and faceting in the PUI. Removing duplicative or erroneous values will also help prevent messy data issues from recurring in the future.

The (nearly completed!) project to clean up our name and subject records that Alison mentioned in her last post also dovetails nicely with this work. The addition of Library of Congress URIs, and the eventual removal of duplicate records will greatly enhance the functionality of the PUI, and will remove duplicative or incorrect values that may confuse users.

All this in six months? No problem!

Current mood.

Since deciding on the scope of the work, we’ve undertaken the fun and slightly intense task of auditing our data in each of these areas. We’ll be back in another post to talk a bit about our data auditing process and reveal some of our most interesting results. For now though, we’re excited to do our part to help make the ArchivesSpace PUI more useful for staff and researchers.

What Could Possiblye Go Wrong? More on Project Management Themes as we Implement the ArchivesSpace PUI

Failure. What don’t I love about failure? Yes, it’s true, my Netflix queue is full of disaster-recovery documentaries, and for *fun* I read case studies about large-scale failures (Note to self: Get dictionary for Christmas).  And I love it not for the reasons you might think. I don’t like wreckage, rubbernecking, or even commotion. I don’t ever want people to be in pain. I mean, I cover my eyes while watching Law & Order! I like examining failure because it’s an opportunity to learn how people think, and the assumptions they make. It’s human to fail, but from a young age we are socially conditioned to avoid it all costs, which ironically, often leads to failure.

I have a very high comfort level with failure. Always have. Hello, old friend. I’m naturally a very serious person. Not that I don’t have fun, or at least my version of it, but in general I can think too much, worry a bit too much, and when I’m all up in it, I can sometimes forget the standard social norms and graces, like polite nodding and small talk. Over time I’ve learned to provide folks a heads up about it. I might tell them not to look directly at my face when I’m thinking, because you’ll think I’m trying to kill you, but I’m not! I just have a terrible thinking face, because I might let everything physical go slack, so I can direct all my internal resources to my sequence of thought. Some of this comes with the job of being a PM. You are trained to look for potential sources of risk and failure, the one thing that everyone else might miss. So you wander a little further into the rain sometimes. What you’re doing though, is trying to protect everyone. That’s how I feel about it. It sounds strange, but by thinking about failure you are trying to bolster their success. I might issue a preface such as, “I’m going to be me for a moment here, and do six months of thinking in the next 3 minutes, so buckle up.” Sometimes I feel like I’m sticking out in the day-to-day, and making folks a bit uncomfortable with my serious outlook. So inevitably when a failure comes, or an emergency, or crisis, I don’t mind as much, because suddenly I become the most normal person in the room. When everything is going up in flames, I’m just like, “Dude, bring it!”

It’s interesting to try and pinpoint where failure begins. Failure can be cumulative or discrete. One of the most referenced examples of failure is the Challenger explosion. I have discussed this case in a group I participate in called #fail4lib, a small subset of the #code4lib group. It’s sort of a support group for people like me! Most folks are familiar with the events of that morning, however, what may be less known, is that the failure began more than a year before that fateful morning. More than a year before the scheduled launch, engineering tests and launch simulations clearly demonstrated the O-ring materials did not consistently perform well. More than a year before that clear, blue, cold morning, a person with experience, integrity, and valid concerns, spoke up and said there is a problem, but he was told by management to back down. Most people view the Challenger explosion as the result of something that went wrong that day, that moment. It’s understandable, the need to see something so horrific as a total accident. And it isn’t something anyone ever wanted to happen. Not ever. But this wasn’t accidental. This was a cumulative failure, not discrete. And the man who spoke up was certainly not the only point of failure/course correction, but I suppose for various reasons his story stands out to me. So what did we learn from this? What did NASA? To this day their procedures reflect the lessons learned from this event. I think an important lesson is if someone raises a concern in their area of expertise or exposure, follow it through. Walk their beat with them, see it as close to their perspective as possible. Have a risk registry for the project and add the raised concern. The risk registry would document the stated issue, the likelihood of it occurring, and a value for what the impact would be. The risk registry also contains your organization’s strategy for dealing with it, whether Avoid, Control, Transfer, or Accept. No one individual or event can control failure, but there are methods to mitigate or better prepare for it to reduce impact. The playwright Arthur Miller is also very reflective about the inevitably of failure in some form, and how as individuals, or organizations, we can better understand the value it can hold.

There is another event a few years back that is a more discrete failure, the crash and sinking of the cruise ship Costa Concordia. First things first, 33 souls were lost. 32 during the sinking, and 1 during recovery. And you can never take that lightly. The captain (who went to trial for manslaughter and abandoning ship) strayed off the chartered course last minute. Well, then came the recovery effort, led by a project manager from Italy. I think for a PM, getting the call to manage a project like this is like going to the show. You’ve been called up to the majors. As PM on such an effort, you have a real countdown clock, each day of the plan is costing hundreds of thousands of dollars, you have unprecedented magnitude of risk, environmental variables you cannot control, and the whole world waiting as a stakeholder. You have to be on your game. You have to think everything through to the greatest extent.You have to imagine multiple outcomes, and prepare for all of them. Managing a recovery effort, or the result of a failed project, means even less tolerance for any additional errors. Costa Concordia was two times the size of the Titanic! It carried 100 thousand gallons of fuel, and it carried supplies and chemicals for 4200 people. It sank in a fragile eco-system where a rare breed of mussel lived. Each diver working on the recovery can only stay down for 45 minutes-talk about a project sprint! This was a half-billion dollar cruise ship, and the recovery effort triple-exceeded that price tag. The force required to to right the ship was one and a half times that required for a space shuttle takeoff!  It took over 7 hours to right it 1/6th of the way up, and once they start, they can’t stop, so they worked even in the dark of night. Can you imagine getting to be the project manager on this one? I’m so nerdy about it, I think he should write a book and go on tour, and I’d be standing first in line to get his autograph.

In my experience, most projects will fail at the beginning or at the end. Why? This is when the thinking and planning require more precision and thoroughness. You must begin and end with input from more than just the project team. People may under-plan, or misunderstand the project’s purpose. People may also assume the project is an isolated event in an organization, or they may not think through what a post-rollout requires. A good PM knows the finish line is never where you think it is going to be. Post-rollout is such an important phase of the project lifecycle. Whether you scheduled a soft-launch, or just a straight cut-over to new implementation, post-rollout still requires a plan. Ideally a smaller “release team” is prepped to monitor production for a two-four week period, and also resolve and archive the project artifacts. Always remember the project doesn’t end the day you go into production. And don’t wait for failure to happen before talking about it. Make it part of your Before Action Review environmental scan. Ask straight up, “How might we fail?” “What could go wrong?” Don’t make it a taboo subject. Don’t assume you won’t be able to know how you might fail before you get started on a project. One of my earliest lessons in enterprise systems administration was to always test for failure.  A colleague from IBM taught me that. When you are testing something, don’t limit scenarios to the “happy path” or the regular use of the system/application. Create tests that you know will fail/not pass go, and see what the outcome is. Did you receive an error message? Did it go into the void? Did that test item get processed anyway, even though you thought it would fail? Failure should be part of your design thinking, and project scenarios. Just as professional sports teams hold practice games in manufactured conditions to simulate potential game-day variables (practicing with blaring music, heckling, artificial rain/snow/cold), organizations could include failure or chaos simulations during their own testing/pre-production efforts. Quite simply, before implementation try to break stuff. You’ll gain more insight into your product then you may anticipate.

It is important not to see success and failure as binaries. An organization’s culture, or philosophy about failure, contributes to a project’s overall success. I’d love to encourage more open conversation about failure, and help organizations define their relationship to it and tolerance for it. There are degrees of failure. Some failure is unavoidable due to complexity, other failure is entirely preventable. There is new thinking on what is called “Intelligent Failure.” These failures are largely experimental in nature. They occur as an organization, or a team, are moving through an ideation/development stage of a product or project design. Some businesses encourage their development staff to “fail faster” or to “fail often.” Some of it depends on the type of industry you work in, and how risk-tolerant you are as an organization. For example, failing often isn’t going to work in hospitals or healthcare. There is a spectrum though. I believe every organization could, and should, have several published methods and tools in place for how to locate failure, discuss it, and ideally, learn from it. There are two interesting areas that often govern attitude towards failure. One is the “sunk-cost fallacy.” This is when someone, or an organization, decides to keep pursuing their course of action or strategy, because they have already invested time and money into it. Even if signs, or instincts, or pure facts are presented, psychologically a decision is made to stick with it and hope for the best outcome. Another is an assumption about consent or control. Sometimes only the most vocal person in the room is listened to. Others may be sitting there, but not speaking up, even if they are holding strong opinions about whether or not to continue down a stated path. I mentioned that every organization should have tools and methods in place for helping manage failure. In this scenario, some helpful methods would include having an agreed upon set of decision making criteria in advance of the project. Setting decision criteria in advance will make it less emotional, if/when a time comes to make a hard choice to evaluate if a project or decision is still working. I also think it helps to document decision making, so it can be referred to uniformly and objectively throughout the project lifecycle.

As we implement the new search and discovery platform for special collections and archives, I have several areas of potential failure in my mind. One of the most common is not having enough time. Everyone working on the project has full time commitments within their regular job. In estimating how much time anything will take, I try to be deeply thorough in my environmental scan, making notes and considerations for other major projects and events within the same time span, including a move back-in after a major renovation in Manuscripts and Archives. Not making preferred deadlines, or the slow shifting of the milestones is always a big concern for me. As a PM, your role is to keep everything moving forward. The work, the teams, the ideas, and conversations. Sometimes you are also managing external parties and vendors, who have their own objectives and deadlines in addition to yours. This project has several technical dependencies within other areas of our infrastructure, and making sure they are also completing successfully is another area of risk. Everyone involved with the project will perceive failure in their own way, just as they’ll perceive project success in their own way. Expectation is another major area where there will be degrees of success and failure. As the PM, you must do your best to be honest about potential points of risk and failure, but also not become paralyzed by them. There is no such state as perfect. There will always be someone in the room who isn’t happy, and seated directly across from them will be someone who is absolutely delighted over your hard work. You can’t take too much of any of it to heart. Just always stick to the philosophy of bringing as much logic and kindness to every situation as possible. In closing, I should note that as much as I am lunch buddies with failure, I spend a good amount of time on the science and art of happiness too! Whatever level you are at fighting the good fight, here is some general advice from my favorite good-hearted failure:

And always remember to put cameras in the back!


Movin’ Around with Reorder Mode in Archives Space 2.2.0

Yale will be soon upgrading PROD to version 2.2.0. This is quite exciting as this is the foundation for our migration from YFAD to the Public User Interface. It also has some updates on commonly used features. I wanted to talk about one in particular—Reorder Mode. If you want to get a head start on learning how this feature works in version 2.2.0, you can try it out in our TEST instance of ArchivesSpace, which has already been upgraded.

Currently in PROD and in older versions of ArchivesSpace, if one needs to reorder their resource record, it was done using drag-and-drop techniques in the tree. One clicked and held the record and dragged it to where they wanted it in the tree. If it’s at the same level as where it previously was, just in a different place, you would hold it between the ‘sibling’ components where you wanted it; if it’s supposed to be a child of a currently sibling component, you can drop it ‘on top’ of the sibling component. A small arrow would should where it was placed in the hierarchy, and a green check box would appear if ArchivesSpace recognized it as a legal move. Let’s look at some screenshots to make that word soup clearer.

Here is a snippet from HVT-0036’s tree, which is in its original order, from PROD.

Here’s me moving the Restoration submaster component to become the sibling between Duplicate and Restoration master. Note the small black arrow between the Duplicate and Restoration master components showing where I’m placing it and the green checkmark by the floating Restoration submaster component (Ghost component?)

And we have the sibling in place!

Now I’m moving the Restoration submaster component to become a child of the Restoration master. I have moved it so it’s on top on the Restoration master—you can see the black arrow resting on the same line of the Restoration master in the tree. Green checkmark of the ghost component shows we’re ready to go.


And we now have a child. Please note that if there were other children of the Restoration master component, it would have ended up on the top of the list if that branch of the tree was not open. If the tree was open so you could see the children, you would be able to drag between its soon-to-be siblings.

While this is not unintuitive, it also sets up users for errors. One can accidentally reorder things without trying to since the tree is always open and the ability to reorder is always ‘on.’ The precision of the dragging and dropping is also not always easy to handle. It’s not uncommon when moving a component elsewhere on the same hierarchical level to accidentally make it a child of a would-be sibling, and vice versa.

Therefore, ArchivesSpace has a new feature- Reorder Mode.

Let’s take a look at HVT-0036, this time in TEST.

The tree looks bigger and has larger markings to denote the ‘line’ of the component. Also, there is a new button, which I outlined in red, titled “Enable Reorder Mode.” If I try to move anything without that button being clicked, nothing’s going to happen with the tree, except maybe opening the component below the tree to edit. So, what do I do to move my Restoration submaster?

First, I’m clicking on the button. This turns the button green and changes the label to “Reorder Mode Active.” It also shows two new buttons: cut, which I outlined in green, and paste, which I outlined in purple. The record view below also disappears below the tree.

I can simply move the component by hovering the cursor (which has turned into a four-direction symbol) over the appropriate white box with gray dots to the far left of the component, clicking, and dragging and dropping it. After I move the component to where I want to, a new dropdown appears, asking me where exactly I want to put it. Since I want the Restoration submaster to come before the Restoration master, I’m clicking “Add Items Before.”

It’s now where I want it.


You can also move multiple components this way! If you click on the white boxes with the gray dots to the far left of the tree while holding the CTRL key on your keyboard, the boxes will turn blue, and numbers will appear to show the order it will be in when you start dragging the components. (i.e. 1 means the first component in the list, 2 is the second, etc.) This has the bonus possibility of changing the order of a level of an inventory on the fly by changing how you click the components.

So here I have selected these three components to move. I want the order to be Restoration master, Master, and Use copy.


At this point, I am dragging the components. ArchivesSpace is kind enough to list the order for me so I know how it’s being moved.

Like with a single component, it will ask me to select whether these components are to be added before a selected component (in this case, Duplicate), after the selected component, or as children of the component. I selected children and they are now children of Duplicate.

Another way to move a component, if you are concerned about not being able to drag it where you want it due to mouse issues or the like, is to click on the name of the component. ASpace will outline it in blue and a move button, which I outlined in orange, will appear.

Clicking on the Move drop down will show the choice to move the component up, down, or down into, meaning within a component as a child. Hovering the cursor over the down into option will open a new drop down to show you where you could possibly put the component.

Since I want this component to be a child of the Restoration master component, I will select that option. It then moves where I want it to be.

Please note that the move button does not work with moving multiple components. If you want to move several components at the same time, you’ll have to try another method.

A final way to move a component, which can be incredibly handy for extremely long inventories, is using cut and paste. This is best done when you want a component to become a child of another component rather than a sibling elsewhere on the list. There is a workaround to use cut and paste for creating siblings, although it’s not ideal. But I will cover that too.

Let’s say I want to make the Duplicate a child of the Master component. First, I select it so it’s outlined in blue. Then I will click Cut. The Cut button will turn gray and the component bar will turn a darker shade of gray.

Then I select the component that will become the new parent, Master. IMPORTANT: you need to click directly on the link, i.e. on “Master.” Simply clicking on the box will not work! The Master component will turn blue.

Click Paste. It will turn gray and the screen will display a loading wheel. Then the child will appear where you selected it, and the Paste button will gray out.

Cut and paste also works for multiple components—again, you select all the components you need to move while holding the CTRL key before clicking cut and going to the new parent. But what if you have an extremely long inventory and you want to make the component a sibling?

Follow the directions for cutting and pasting. When you paste, however, you want to make this component a child of the component directly above or below where you ultimately want to move the original component.

In this case, I wanted Digital copy to be a sibling between Licensing copy and Master. So now that I made Digital copy a child of Master, I can drag and drop it above Master.

Final note/warning: while it is tempting to just go, “hey, let me use the move function,” using it to tell it to go up a level will move the component to the top of the inventory. This is likely what you don’t want to do.

While this is a bit of a change, it does allow for more reliability of reordering and I hope this tutorial is helpful for you! For those who prefer watching this in action, I will be posting a screencast next week explaining all of these methods.



Meeting User Needs via Improvements to the ArchivesSpace Public User Interface

Hello, everyone! This is Alison Clemens, archivist at Manuscripts & Archives, member of the Yale Archival Management Systems Committee, and team leader of the ArchivesSpace Public User Interface (PUI) Settings & Enhancements Workgroup. Our workgroup is charged with reviewing and documenting any default changes we might want to make to the public user interface, and collecting and maintaining a list of possible future interface changes and enhancements. I’m pleased to give you an overview of some of our workgroup’s initial planning as we prepare to implement the ArchivesSpace PUI here at Yale.

Before I dive into our workgroup’s goals and progress, I’d like to emphasize that lots of behind-the-scenes data cleanup and enhancement work has been and will be instrumental in making the project successful. For example, we did a big project to clean up our people, organization, and subject records in ArchivesSpace, and we literally exorcised some ghosts in the process (no, really — did you know that the Library of Congress Name Authority File includes spirits?). But our ongoing data work will be the subject of a future blog post.

This post will focus on our shared raison d’être: our users, and ensuring that we are providing the best possible services and platforms to meet their needs. I’ll note here that as we consider how to serve our users, we’re thinking about both external users (i.e. patrons) and internal users (i.e. library staff).

Yale’s special collections comprise a fairly large data universe, and figuring out how best to serve the diverse user constituencies of our 10 campus repositories is a challenge. We’re lucky, though, that we’ve got a good team on the case. Our workgroup members include: Stephanie Bredbenner, Processing Archivist at Beinecke Library; Anna Franz, Assistant Head of Access Services at Beinecke Library; and Jonathan Manton, Music Librarian for Access Services at the Music Library. Each of us brings specific skills and perspectives to our work, and we share a focus on taking a user-centered approach to figuring out how to determine and prioritize settings and enhancements for the new PUI.

I’ll pause here to talk a little bit about the logistics of how we’re accomplishing our work. We’re using four platforms to communicate and coordinate: Google Drive, Asana, Slack, and in person and virtual meetings. These tools are invaluable to us as a workgroup, and especially to us as a larger team — there are 30 team members, working from a variety of physical locations, involved in the six workgroups that comprise the PUI project, and communicating about dispersed goals and tasks with a group of that size is a challenge. Fortunately, our virtual communication platforms have made this process easier to manage. We’re also providing ample opportunity for in-person communication via regular workgroup meetings, biweekly workgroup leaders meetings, and monthly open forums.

Now, back to the workgroup’s goals and tasks. Given our desire to put the user first, we naturally concluded that much of our work depends on the data gathered by the Usability & Accessibility (U&A) Workgroup. Although our workgroup members certainly have a sense of user needs, we want to be sure that we take every opportunity to actually hear directly from users about their needs and preferences. Working closely with the U&A Workgroup is instrumental in assuring that we’re successful in doing so.

The activities of the U&A Workgroup will be the focus of a future blog post, but in the meantime, I’ll mention that the U&A Workgroup has already conducted about a dozen user interviews with user constituencies and is in the process of designing and conducting user testing.

This brings us to the PUI Settings & Enhancements Workgroup’s starting point.

Essentially, our job is to take the out-of-the-box PUI (see screenshot below) and turn it into something that’s as responsive as possible to user needs. In order to do this, we revisited the Before Action Review (discussed by Melissa in our last blog post) and considered what we were trying to accomplish, what we thought would change, and how we might know we were successful. In this discussion, we agreed that the two most essential goals for our workgroup are to improve user experience (as demonstrated through metrics provided by the U&A Workgroup) and ensure broad and thoughtful staff input and buy-in.

Screenshot of the Yale ArchivesSpace PUI, titled “ArchivesSpace at Yale DEV INSTANCE,” with a search box and an upper navigation menu with "Repositories," "Collections," "Digital Objects," "Unprocessed Material," "Subjects," "Names," "Classifications," and a small magnifying glass listed as navigation options.
Screenshot of the out-of-the-box Yale ArchivesSpace PUI

To accomplish our goals, we’ve done a few things (so far, and more to come).

Our first task was to address some of the most obvious issues with the current PUI. We’ve discussed a lot of potential improvements, but two general categories of changes have risen to the top. First, we’ve identified some basic improvements we might like to make to the front page. Most of these things are small improvements to make the page clearer and more intuitive; we’re waiting for the U&A Workgroup user testing feedback before we ask for more major changes. We have, though, been examining some other examples of finding aid databases to see what features and framings we like (shoutout to the New York Public Library!), and we’re looking forward to digging more into how to make the PUI’s front page more compelling and user-friendly.

Our second task was to figure out how to address issues with search relevance. Relevance ranking in search results is a known issue in the ASpace PUI, and we wanted to get a sense of exactly how the search should be operating. To assess this, we solicited search use cases from all of the repositories via an online web form. The web form asked staff members to identify any two of their repository’s significant collections and answer questions about how they might search for known items from those collections. Specifically, we asked respondents to indicate a) what search terms they would use to find materials in their collections and b) what search results they would expect to see at the top of the search results list, based on their identified terms. We’re now in the process of analyzing this data. Hearing directly from our colleagues about their search cases and expected outcomes will help us determine a set of use cases to send out for development work.

Although the PUI Settings & Enhancements Workgroup has already done a lot of work, we still have a couple of months of the PUI project left, and our work has really just begun. We look forward to continuing our work and are particularly excited to review the Usability & Accessibility Workgroup’s user testing results and accessibility audit and determine how to prioritize additional changes based on that key data. Stay tuned — we look forward to keeping you updated!

Implementing the ArchivesSpace PUI: A Before Action Review

Plato’s Ship of State metaphor postulates, “A true pilot must of necessity pay attention to the seasons, the heavens, the stars, the winds, and everything proper to the craft.” Accounting for all possible variables, risks, pluses, and minuses is also the mainstay of project management. And, if Plato were alive today, you can bet he’d make you put all of it into the project charter!

Before any project begins, the Project Manager (PM) should initiate a Before Action Review, to identify as much of the current environment as possible. This will include talking to people who have expressed the business need, talking to stakeholders, looking at proposed/active concurrent projects, and remembering to reflect on not just what is presented, but what isn’t (more on that in an upcoming post). Often a project is initially presented as either a loosely defined idea, or a shoot the works scenario wherein everyone is promised a pony. Then the PM should figure out the ideal, sustainable, middle ground within an organization’s current capacity, and manage the expectations sure to follow that recommendation.

Next the PM is going to look at the calendar. (How I do love calendars! I think they are a fascinating combination of math, language, and social construct.) The calendar is either the one she carries in her head, the one drawn with dry-erase markers on a whiteboard, or the one on her mobile. (Another nerdy confession- I love smart phones because it means I always have a calendar within reach.) Time is everything in project management, and calendars are how we identify units of time at work. Mondays, how many business days, a work week, the month of January. The PM asks other time-centered questions-Do you have enough? Do you have work estimates? Have you adequately expressed time as a direct cost? What about time as momentum? And time not just at the party, but the setup and breakdown of the party? Time is always on my mind. That look Steelsen gives me when I know March 27, 2018 is a Tuesday without having to check. During the ArchivesSpace crisis, my dad gave me a replica of the time turner necklace Professor McGonagall gave Hermione, that enabled her to be in two places at once and get all her work done. That last week in January, I never took it off, and I still cry every time they go back to save Buckbeak.

As Yale begins the work of implementing the new public user interface for ArchivesSpace, we took time to conduct a Before Action Review. A BAR is not as typical as an After Action Review (AAR), which is when the work is completed and you reflect on what went well and what could have gone better. But I like the beginnings and endings of projects the best, because they can be the areas that receive the least amount of planning and time. They are often underestimated for their overall influence on project success and stakeholder satisfaction, and ensuring your work and choices are directly addressing the business need. I like this image for what it conveys about taking time to think things through.

I typically ask three questions during a BAR-What are we trying to accomplish? What do we think will change? How will we know if we are successful? These three simple questions generate invaluable insight into what people are thinking, what assumptions exist, what folks are scared about, or excited about, and how we characterize success. The PM should then continue to use the feedback to shape elements of the project. For example, here are some of the responses to What are we trying to accomplish in the PUI project?

  • Roll out a holistic and focused service, fully integrated with production
  • An improved user experience for discovery and access of archival materials

These are important objectives. They express the aging technical environment for archives and special collections at Yale, and they also express our professional and organizational values. They are aspirational, but still possible, if given the right amount of time. As a project team, including stakeholders and sponsors, we need to keep defining and refining what we mean when we say holistic and focused. Those words capture ideas and feelings about what we want, and a PM should help break them out into S.M.A.R.T. style goals. Reviewing the point about an improved user experience, leads to a need to examine current metrics on what the user experience is for discovery and access of archival materials. As we roll into production, we can’t qualitatively, or quantitatively, measure improvements if we don’t know where we started, and if we don’t define improved. If the project team states this as something we want to accomplish, the PM should revisit the project plan to include specific tasks and resources to address it.

Taking that a step further, the PM also must determine what’s in scope, and outline recommendations for what improvements can be made now, and which ones need to wait for post-rollout. The PM might use the project scorecard method to manage any objectives. This involves stating an objective, something like improved user experience for discovery, and identifying measures, targets, and either tasks or program initiatives, depending on the scale of the objective. A measure in this context is just as it sounds, an observable parameter, such as the number of monthly service desk questions concerning user difficulty in locating materials. Then you set several targets as reasonable to the scope and time frame of the project. Maybe the phase 1 target is a 5% reduction in this category of question, and a longer-term target is a 15% reduction. (These are examples). Finally, the project plan would include tasks to initiate this improvement, or if the effort is a greater scale, perhaps a program initiative is setup to increase end-user training and conduct more frequent usability reviews.

The “what do we think will change” question is particularly important to me. Sometimes I observe organizations discussing change on a meta-scale, such as what will libraries be like in 50 years? But this question in a BAR has a more immediate time frame of around 6 months, and the process and outcomes of change are likely to be felt more acutely by people here now. Earlier, I mentioned that the BAR could help suss out what folks might be scared about, or anxious, or unclear in a pending project. And that may not be the best business language to use, but it gets to the heart of our work and our feelings about our place in it. There’s lots I feel anxious about at work. Will we make our preferred upgrade deadline? Am I smart enough to be of service? What if I never fully understand EAD? And don’t even get me started on how CAS works! I know there are tokens involved, but not the kind you can use to buy pizza. If people are given an environment to state their concerns, and be heard, then the fear of change immediately begins to dissipate. If a colleague offers that they are concerned about having to learn a whole new way to process collections, recognize that staff training is an important step, and the PM should build time into the project for it. Make conversation about change a part of the project, the stakeholder registry, and communications plan. This also applies to assumptions. Asking folks what they think will change, is a bullet train to uncover assumptions. Find out what they are early in the process, while there is time for relevant course correction.

An interesting outcome of the “what do we think will change” question for the PUI, is how much feedback there was on anticipated changes for staff workflows. When implementing a public user interface, it’s possible that we could minimize the amount of focus on staff workflow, and think primarily about the user’s experience. 66% of the name is external facing, public user. But for legitimate reason. It is a public user interface. The “what are we trying to accomplish” question generated more user-based objectives, like the improved experience for discovery and access. I think it will be critical to balance the needs in both areas, within our preliminary project time frame. As Melissa Barton noted, “We can’t underestimate the effort required to teach users about what finding aids are.” I’m eager to continue this discussion with stakeholders, to see where there is alignment between these two aspects of the project.

As to the last question regarding, how will we know if we are successful, well that’s easy!




Reports, in three phases

After attending this month’s ArchivesSpace Hackathon (which was an amazing event, by the way, which deserves its own blog post or two!), I started mucking about with the new reporting features in ArchivesSpace. So far, I’ve been tackling reports in three ways:

  1. Adding new reports to ArchivesSpace that I authored in JasperSoft Studio
  2. Editing prepackaged ArchivesSpace reports with JasperSoft Studio
  3. Adding a print.css stylesheet to ArchivesSpace to emulate those “print screen” reports from Archivists’ Toolkit

Long story short, it was actually a lot easier than I expected to add new reports to ArchivesSpace (tested on versions 1.4.x).  If you can write a new report with a tool like JasperSoft Studio, then you’ll be able to upload it to ArchivesSpace with ease.  That said, I still wonder what other changes might be required to load new reports into ArchivesSpace that are parameterized (and so, require additional user interactions before running), but I hope that those features will be added to new releases of the ArchivesSpace reports module if they aren’t already there.  In any event, those types of reports will be my next steps of exploration.  In the meantime, this post is about what I have up and running right now.

Continue reading Reports, in three phases

Link to Training Session #1

Training Session #1: Getting Started in ASpace is coming up on Thursday at 10am. If you would like to join us from afar, use this link to access the session:

We will be using a web-based videoconferencing service called Zoom to broadcast the session. When you click on the link, Zoom will quickly install on your computer before joining the session – see this video for instructions. Note that you will need headphones or speakers to hear the session. The session will also be recorded – we will send out a link after uploading.

Also, please note that info about further trainings will be posted on our Libguide, rather than on this blog. Thank you!



Software Performance Testing Or: How to Tell If You’ve Got Bad Blood





Sometimes a girl just needs to see a specialist. Arsyn and Catastrophe (played here by Selena Gomez and Taylor Swift) used to be besties, but a betrayal results in an apparent demise and a lot of bad blood. However, all is not lost #revengegoals. We all know band-aids don’t fix bullet holes, so what’s a girl to do? With the expert advice of consultants and a little re-engineering, our protagonists reunite for a final showdown.

In the same way a person in discomfort would seek a specialist to help determine what’s wrong, YUL sought similar diagnostics to suss out the root causes of ArchivesSpace performance problems. We went live with ASpace in early June 2015, however almost immediately the application became unusable due to timeouts, system crashes, or records that took so long to render you wondered if it wasn’t too late for law school while contemplating the status bar. A battery of diagnostic tests and tools helped pinpoint the source of ASpace’s woes.

There are many tools available (commercial, free, or locally developed) to conduct performance testing.  They range from simple to sophisticated and platform dependent or independent. Generally speaking though, software performance testing is an approach to testing that utilizes defined or prerecorded actions that:

    • Simulate known or anticipated user behavior in an application
    • Validate business requirements to be performed by the application
    • Help pinpoint where performance breakdowns are occurring, or performance could be optimized
    • Report a set of results and measurements for comparison, troubleshooting, and bench marking system performance
    • Can be executed automatically by a timer, crontab, or on-demand

In my opinion, testing software during development and during implementation is as important as tasting your food as you prepare it. Think of the list of ingredients as your recipe’s functional requirements. Does it need more salt? If the addition of an ingredient causes your sauce to break, do you start again or serve it as is? What if you over-engineer the cream and are stuck with butter? (I think that may be referred to as “re-branding”).

Software performance testing is critical to any development project, whether an open-source, or vendor-developed application. This methodical approach to product testing provides an IT department with a consistent review of core functions measured throughout a product life cycle. The typical software development life cycle places heaviest testing activities during the programming/development phase. Before staff training. Before production. It is a necessary step towards final user acceptance of the new or modified application. But I also encourage ongoing testing as functional or user requirements evolve, and as significant events occur in your application environment, such as network changes or application upgrades. Post-production, testing helps with ongoing capacity planning (data or users) and in this way it reveals itself as a useful tool not only for diagnostics, but also for systems management.

There are several types of performance tests, including unit, smoke, peak, load, and soak. I think peak and load are most common, used to measure heavy use of the application, but I love the imagery conjured by smoke and soak. Back in the day, smoke testing was quite literal–did it start on fire when you first turned it on? If not, you were good to go. (BTW, I love that this continues my cooking analogies from earlier). These types of tests provide controlled opportunities to view system performance under a range of conditions, and provide project lead-time to tune the infrastructure, software, and attendant services involved with your business process. But let’s not overlook the old eyeball test. In other words, if you see something, say something! Is the system performing as expected? Does it seem slow, sluggish, inconsistent? Front of the house is often where many non-functional requirements are measurable or observed, such as data accuracy, general usability, or system fail-over measures.

While the range of measurement tools is incredibly helpful, software can’t do everything. Knowledge of the application and user behavior falls outside the scope of these tools. We need people for that. Outlining the set of behaviors or actions to test, also people-driven. Interpreting and resolving the test results…you get where I’m going.bartgetshitbyacar9

Five Hundred Twenty Five Thousand Six Hundred Minutes, or how do you measure a workflow? Using one typical staff workflow (search & edit an accession record) in ASpace, we recorded these measurements:

  • ArchivesSpace backend 6 seconds to fetch the record from the database and produce the JSON representation
  • ArchivesSpace frontend 16 seconds to produce the HTML page for the edit form
  • User’s web browser 2+ minutes to render the HTML page and to run the JavaScript required to initialize the edit form

Each of these is a step in the process from the moment the user initiates a search in ASpace until the application renders the requested result. The first two steps are not entirely visible to the end user and represent performance on the back end.  What the user is painfully aware of is the 2+ minutes it takes in the browser (their client) to help them to the next step, getting their work done.

Each of these measured steps are jumping off points for further analysis by IT or the developers of the software. Ultimately, some MySQL innodb buffer adjustments brought the first two steps (22 seconds) down to 5-6 seconds. A new release of the software interface introduced additional response time improvements. Now when we discuss response time in any tally of seconds, should anyone be fussing over that? Yeppers. When you enter a search in Google, how long do you expect to wait for search results to start filing in? If you search an OPAC or Library Discovery layer, same question. When the app has a multi-stop itinerary, each step should be as efficient as possible. These are standard user expectations for modern web-based tools.

In the local case henceforth known as, “Nancy Drew and The Mystery at the Moss-Covered Mansion”, we used JMeter and Chrome Developer tools to measure ASpace performance back to front. JMeter provided the first two measurements noted earlier with the accession record example. Chrome developer tools provided the third measurement for accession record workflow.  A sample test run in JMeter is configured with variables such as threads (number of “users” to simulate), ramp up (the time to wait between the first thread and starting subsequent threads), and loop (how many times this should be repeated). All configurable values for the type of test you need to run (peak, soak, etc.), and directed at your dev, test, or prod instance of a service. Using Chrome Developer tools, you can capture time to complete browser-based actions such as loading, scripting, rendering, and painting.

I was fortunate to present this work this summer at the ArchivesSpace Member Meeting during the Society of American Archivists annual conference. Although the audience was clearly peppered with Justin Bieber fans, I think the general idea that if T-Swift can be re-engineered, so can an ArchivesSpace implementation was understood.

Nearly 600 million people have watched the Bad Blood video. If you are not one of them, you probably have a library card. But for those of us alumnae from Upstairs Hollywood Medical College, this song was my summer jamTaylor_Swift_Bad_Blood

Training Session #1: Getting Started in ASpace

Our first ArchivesSpace training session for Yale users will take place from 10-11am on Thursday, November 5th in SML Lecture Hall (and online)

Session I: Getting Started in ASpace
How to add users
Where to find documentation & resources
How to stay in the loop

All training sessions will be live streamed and recorded.


Questions about the training sessions? Contact

Upcoming Training Sessions

The ArchivesSpace Training Subcommittee is pleased to announce “Phase I” of a series of training sessions for YUL staff, beginning the first week in November. Stay tuned for days & times!

All training sessions will be live streamed and recorded.

Session I: Getting Started in ASpace
How to add users
Where to find documentation & resources
How to stay in the loop

Session II: Navigating the Resource Record
What is a Resource Record in ASpace?
Required and Optimum description requirements
How required fields are indicated
DACS prompts in the Resource Record
The Notes Field

Questions about the training sessions? Contact