Software Performance Testing Or: How to Tell If You’ve Got Bad Blood





Sometimes a girl just needs to see a specialist. Arsyn and Catastrophe (played here by Selena Gomez and Taylor Swift) used to be besties, but a betrayal results in an apparent demise and a lot of bad blood. However, all is not lost #revengegoals. We all know band-aids don’t fix bullet holes, so what’s a girl to do? With the expert advice of consultants and a little re-engineering, our protagonists reunite for a final showdown.

In the same way a person in discomfort would seek a specialist to help determine what’s wrong, YUL sought similar diagnostics to suss out the root causes of ArchivesSpace performance problems. We went live with ASpace in early June 2015, however almost immediately the application became unusable due to timeouts, system crashes, or records that took so long to render you wondered if it wasn’t too late for law school while contemplating the status bar. A battery of diagnostic tests and tools helped pinpoint the source of ASpace’s woes.

There are many tools available (commercial, free, or locally developed) to conduct performance testing.  They range from simple to sophisticated and platform dependent or independent. Generally speaking though, software performance testing is an approach to testing that utilizes defined or prerecorded actions that:

    • Simulate known or anticipated user behavior in an application
    • Validate business requirements to be performed by the application
    • Help pinpoint where performance breakdowns are occurring, or performance could be optimized
    • Report a set of results and measurements for comparison, troubleshooting, and bench marking system performance
    • Can be executed automatically by a timer, crontab, or on-demand

In my opinion, testing software during development and during implementation is as important as tasting your food as you prepare it. Think of the list of ingredients as your recipe’s functional requirements. Does it need more salt? If the addition of an ingredient causes your sauce to break, do you start again or serve it as is? What if you over-engineer the cream and are stuck with butter? (I think that may be referred to as “re-branding”).

Software performance testing is critical to any development project, whether an open-source, or vendor-developed application. This methodical approach to product testing provides an IT department with a consistent review of core functions measured throughout a product life cycle. The typical software development life cycle places heaviest testing activities during the programming/development phase. Before staff training. Before production. It is a necessary step towards final user acceptance of the new or modified application. But I also encourage ongoing testing as functional or user requirements evolve, and as significant events occur in your application environment, such as network changes or application upgrades. Post-production, testing helps with ongoing capacity planning (data or users) and in this way it reveals itself as a useful tool not only for diagnostics, but also for systems management.

There are several types of performance tests, including unit, smoke, peak, load, and soak. I think peak and load are most common, used to measure heavy use of the application, but I love the imagery conjured by smoke and soak. Back in the day, smoke testing was quite literal–did it start on fire when you first turned it on? If not, you were good to go. (BTW, I love that this continues my cooking analogies from earlier). These types of tests provide controlled opportunities to view system performance under a range of conditions, and provide project lead-time to tune the infrastructure, software, and attendant services involved with your business process. But let’s not overlook the old eyeball test. In other words, if you see something, say something! Is the system performing as expected? Does it seem slow, sluggish, inconsistent? Front of the house is often where many non-functional requirements are measurable or observed, such as data accuracy, general usability, or system fail-over measures.

While the range of measurement tools is incredibly helpful, software can’t do everything. Knowledge of the application and user behavior falls outside the scope of these tools. We need people for that. Outlining the set of behaviors or actions to test, also people-driven. Interpreting and resolving the test results…you get where I’m going.bartgetshitbyacar9

Five Hundred Twenty Five Thousand Six Hundred Minutes, or how do you measure a workflow? Using one typical staff workflow (search & edit an accession record) in ASpace, we recorded these measurements:

  • ArchivesSpace backend 6 seconds to fetch the record from the database and produce the JSON representation
  • ArchivesSpace frontend 16 seconds to produce the HTML page for the edit form
  • User’s web browser 2+ minutes to render the HTML page and to run the JavaScript required to initialize the edit form

Each of these is a step in the process from the moment the user initiates a search in ASpace until the application renders the requested result. The first two steps are not entirely visible to the end user and represent performance on the back end.  What the user is painfully aware of is the 2+ minutes it takes in the browser (their client) to help them to the next step, getting their work done.

Each of these measured steps are jumping off points for further analysis by IT or the developers of the software. Ultimately, some MySQL innodb buffer adjustments brought the first two steps (22 seconds) down to 5-6 seconds. A new release of the software interface introduced additional response time improvements. Now when we discuss response time in any tally of seconds, should anyone be fussing over that? Yeppers. When you enter a search in Google, how long do you expect to wait for search results to start filing in? If you search an OPAC or Library Discovery layer, same question. When the app has a multi-stop itinerary, each step should be as efficient as possible. These are standard user expectations for modern web-based tools.

In the local case henceforth known as, “Nancy Drew and The Mystery at the Moss-Covered Mansion”, we used JMeter and Chrome Developer tools to measure ASpace performance back to front. JMeter provided the first two measurements noted earlier with the accession record example. Chrome developer tools provided the third measurement for accession record workflow.  A sample test run in JMeter is configured with variables such as threads (number of “users” to simulate), ramp up (the time to wait between the first thread and starting subsequent threads), and loop (how many times this should be repeated). All configurable values for the type of test you need to run (peak, soak, etc.), and directed at your dev, test, or prod instance of a service. Using Chrome Developer tools, you can capture time to complete browser-based actions such as loading, scripting, rendering, and painting.

I was fortunate to present this work this summer at the ArchivesSpace Member Meeting during the Society of American Archivists annual conference. Although the audience was clearly peppered with Justin Bieber fans, I think the general idea that if T-Swift can be re-engineered, so can an ArchivesSpace implementation was understood.

Nearly 600 million people have watched the Bad Blood video. If you are not one of them, you probably have a library card. But for those of us alumnae from Upstairs Hollywood Medical College, this song was my summer jamTaylor_Swift_Bad_Blood