Data by the Foot

by David Cirella

Tapes, Tapes, Tapes

There are no shortage of different manners in which digital objects find their way into our collections. From the various types of network-based transfers to CD-Rs and/or floppy disks tucked into boxes of paper records, working out the processes around transferring data from one place to another is an everyday task. While the most common methods of transfer have tried-and-true solutions, legacy media formats, such as data tapes, present a need for new and custom solutions (and often some detective work).

Tape?!

As a medium, tape (magnetic tape) is something that nearly everyone has had some exposure to. From the ardent mix-tape makers of yesteryear to those more recent devotees to the format, tape is, or has been, a common item in many industries and households alike.

In addition to audio and video applications, magnetic tape has been widely used for data storage, with a multitude of different formats coming in and out of common use in enterprise and academic computing areas since the 1950’s. While the set of data tape formats is a diverse group, enterprise-grade tape-based storage generally provides a robust mechanical and error-resistant storage option. Other attractive qualities of tape storage include: the increased stability that comes with an off-line (or near-line) format that protects data-at-rest from any unintentional changes (accidental deletion, modifications, virus/malware), lower cost relative to hard drives of the same capacity, and longevity of up to 30 years

9 Track tape

9 Track Tape

 

Risks

Despite these positive qualities, as with any physical media, tape is susceptible to degradation over time. Environmental factors, such as relative humidity, can affect the robustness of data. Temperature and tension also have an effect on the health of tape (and data stored on it).

Many of the risk factors affecting tape are difficult to assess for the media we receive that are targeted for preservation. Specifically, the environmental factors that most affect tape can be very difficult to ascertain on tapes that have not been held in library storage facilities.

Recovery Workflow

Given the wide time-frame during which data tape formats were in use, coupled with the prevalence of risk factors affecting media of that age, data tapes have become a regular target for the recovery of digital content. Over the past year in the the Digital Preservation Unit, I have worked on the recovery of data from tapes written during the 1970’s to early 2000’s, in various formats including: SDLT, Data8, QIC-80, and 9 Track tape.

SDLT tape

SDLT Tape

One unique aspect of data tape formats is the diversity of physical formats that have come in and out of use over the past 50 – 70 years. This is especially evident in contrast to the relative stabilization of other physical formats (i.e., only two common sizes of floppy disk) that have enabled recovering disks written in many different formats with a small number of physical drives and a Kryoflux. While there are varying levels of complexity involved based on the format’s age, prevalence in the marketplace, and reliance on standards, each tape format requires having access to the full stack of hardware and software needed to access the data. Each format of tape that is received kicks off a series of steps to identify and acquire the technology needed to begin saving the data within.

Data8 Tape

Data8 Tape

The high-level goal of the recovery process to move data into its long-term home in our Digital Preservation System. The process for working with tapes is detailed in the largely contiguous steps below.

 

Tape in hand:

The kickoff of any recovery is receiving a tape. This step begins the detective work of obtaining a working knowledge of both the tape format and the specific tape itself. Most useful at this step are any markings or labeling on the physical item or case, and/or other accompanying material. Typically there will be some marking of the make and model of the tape itself (like those found on a blank audio cassette tape).

Next is turning to the internet to find as much as possible about the format. The goal is to determine the era of the tape, find any manufacturer documentation, grab specification/standards documentation (if possible), and download software or drivers for related hardware. With some basic info gathered, next is determining what hardware will be needed to access and read the data.

Tape Drive:

Finding the proper drive for reading the tape in hand involves identifying compatible drives and coming up with a list of manufacturers and models. One consideration is the various generations that the tape format may have progressed through over its lifetime; this dictates drive compatibility.

In the case of a recent collection of tapes, the SDLT format is a part of a family of 10 different types of 1/2 inch data tape cartridges, beginning with the introduction of CompactTape in 1984 and ending with DLT VS1 in 2005. In the best case, information identifying which drive in the DLT family will read this specific generation of tape has been pulled out of the documentation and listed in one place (like Wikipedia in the case of SDLT), other cases require seeking out the documentation for various drives to confirm compatibility with the tape in-hand.

After determining the model of the ideal drive(s), the next step is to find one! In some cases, we have a compatible drive, already on-hand, in the di Bonaventura Family Digital Archaeology and Preservation Lab. Other times we turn to eBay to seek out and acquire what we need.

For tape, specifically for the DLT family, there were a couple of manufacturers of each generation of drive mechanism, and OEM resellers that would use that mechanism in their products. While not a huge issue, it can take some extra translation between the OEM labeling of the drive product and the model name of the tape drive.

“New” SDLT Tape Drive

“New” SDLT Tape Drive

The ideal scenario is finding a drive in new old-stock condition, that is, still in its original packaging, completely untouched. This is particularly important for tape given the wear-and-tear caused by regular use that are amplified when degrading or dirty tapes are read. A blank tape and a cleaning tape are also important to grab for testing and maintenance purposes.

Host system – Operating environment:

Next we turn to actually using the drive. As with any peripheral, a host system is needed that is able to provide:

  • physical interface with the drive (often via additional card)
  • run drivers for the interface and tape drive
  • run software to control the drive
  • run software to read (and write) data to media in the drive

Some of these functions are found combined in a single application.

My first approach is generally to get access to or recreate a host system that is, in all aspects, as close to the system that would have been used originally with the drive and tape. The host system stack includes the hardware (workstation, interface cards) and software (operating system, drivers, applications). Ideally everything is available and ‘just works’ when combined. Most often each part requires some detective work to track down documentation, drivers, old software, and a fair amount of troubleshooting to solve the ‘old problems’ that occur when using legacy technology.

By the end of this step we are able to successfully read data off a tape, indicating that all parts of the stack are operating and interacting successfully. With this success we turn to finding any optimizations we can make to exploit modern technologies allowing us to increase the efficiency of working with legacy media and systems.

Optimization:

When working with legacy hardware and software, the greatest optimizations come from swapping out any part of the stack with modern technology. The modern equivalents of each component will most often provide improvements in reliability, speed, usability, and connectivity, each of which can make working with tapes more efficient and pleasant. The optimization process is similar to the three above steps, beginning with exploring alternatives for the hardware interfaces, workstation hardware, the operating system, and software applications for control of the device and data transfer. In the ideal case a legacy drive can be connected using a physical adapter to a modern interface, with modern hardware, running a current operating system, and operated with standards-based applications. The scripting possibilities and network connectivity enabled by these substitutions greatly increase the number of tapes we can process.

The Joy of Tape

Interspersed in all of these steps is testing and troubleshooting. Relying on legacy systems requires troubleshooting the full stack, turning back the clock on decades of technological and usability improvements. While the process can sometimes be arduous, the rush of joy that comes from hearing a tape spin up for the first time in decades, followed by seeing the bits of data that would be otherwise inaccessible, makes working with tape a wonderful experience.