Anyone with a pre-MP3 music collection has seen this icon before.
It’s possible you didn’t notice it, lightly printed in small text, hidden next to the band name or track listing, but it’s there. In looking through my personal music collection I was surprised to find it on nearly every album from the Velvet Underground to Britney Spears.
This icon identifies optical media as Compact Disc Digital Audio, or CD-DA, a format created specifically for storing digital audio. In 1987 Philips and Sony released IEC 60908, which outlined the technical details for a CD-DA. Most mass produced music in the ’90s was written to a CD-DA because this format allows for higher storage capacity, meaning more musical tracks can fit on a single disc.
Why does this matter to our born digital archives, and specifically for our born digital accessioning service?
Because manufacturers of CD-DAs exchanged more data per sector (i.e. more music on a disc) for a higher error rate than your standard CD-ROM. According to “An Introduction to Optical Media Preservation” by AVPreserve, standard commercial hardware is only 95% accurate at the track level. Meaning that up to 5% of content could be lost if we approached CD-DAs with the same disk imaging workflow used for other digital media.
Our original workflow for optical media required that the service creates an iso image using Forensic ToolKit. However given the high error rate on this type of disc, it’s imperative that we change our workflow for CD-DAs. Disk images are created by reading the data on the disc once, which would create an image including this high error rate. In order to avoid losing up to 5% of the relevant data, we’ve decided to change our workflow from creating a disk image to copying the audio files using a tool made specifically for working with CD-DAs. Following the suggested workflow proposed by AVPreserve, we’ve adopted the use of Exact Audio Copy (EAC), a software tool created specifically for extracting high quality audio tracks. The tool reads each sector of the disc multiple times to detect any errors. If an error is detected EAC continues to reread the disc 16 times and only considers the sector error-free if at least 8 of those 16 attempts retrieved the same data. If fewer than 8 reads match, EAC notifies the user and provides the time position of that error.
But how do we know for the sure if the disc in front of us is a CD-DA or a CD-ROM or a DVD? Although the icon discussed above is a strong indicator of a CD-DA, it isn’t always accurate. Some CDs are formatted as a CD-DA, but lack the printed icon. The tool IsoBuster allows us to review the properties of a disc and determine if it is a CD-DA.
Now that we’ve identified the correct tools for identifying and capturing files from CD-DAs, we need to consider which configurations will work best for our purposes. Here are a few of the areas we’re considering in testing our configurations for EAC. The majority of our research on configurations came from the EAC wiki provided by The Hydrogen Audio Knowledgebase, and testing in our lab.
The main benefit of Exact Audio Copy for archives is the secure mode for copying audio media. This is the mode that ensures each disc is read a minimum of two times, and up to 16 times if an error is detected in the first two reads. Secure mode can be enabled through the drive options, but there are a few settings you need to select about your optical media drive. If you don’t know if your drive caches audio data or is capable of retrieving C2 error information, that’s okay! EAC is able to determine what features are available on your drive. By entering a disc into the drive and clicking the “Detect Read Features…” you can automatically apply the correct settings.
Silent Blocks between Tracks:
Since the gaps are part of how the record creator worked with the material we have decided to preserve these silent blocks. In order to preserve these gaps, we detect them using the “Action” menu. You can confirm that these silent blocks were preserved, by reading the log file after you have completed copying the disc.
EAC has three options for copying media: Copy Selected Tracks Uncompressed, Copy Selected Track Compressed, and Copy Image and Cue Sheet. We decided not to use the compression options because it’s unclear from the tool’s documentation how much loss will be incurred through compression.
Copy Selected Tracks Uncompressed results in a WAV file for each selected track on the disc. This allows the archivist to make appraisal decisions and only capture selected tracks, rather than the entire disc. If a single track has an associated name in EAC, either embedded in the file by the creator, entered into EAC by the archivist, or identified from a connected database, that will be used as the file name. Otherwise the WAV files will be named 01 Track01.wav, 02 Track02.wav, etc. This option does not result in the creation of a cue sheet, but a log file can be created after copying, which contains all of the same information except the amount of silent time between tracks. Instead of recording these timestamps in the cue file, this option appends the silent block to the preceding track.
Both copying options allow the user to create a log file after the copying process has completed. This log file provides information on both the disc and the process used for copying. This includes the selected read mode, which settings were utilized, if silent blocks on the disc were deleted, the table of contents extracted from the CD, and other technical details of the copying process. This log file also provides information about any errors that may have occurred.
Copy Image and Cue Sheet results in a single WAV file for the entire disc and a .cue text file. That file will be named from the information entered in the metadata fields by the archivist following the convention of “CD Artist – CD Title”. If no information is listed, the file will be named the default of “Unknown Artist – Unknown Title.” Since the file naming convention for all other output from the digital accessioning service is to use the media number, this option would require the archivist to enter the media number in the Title field and delete the artist information from the filename. Since this option creates a single WAV file for the entire disc, embedded track file names are no longer associated with the WAV file, but instead are listed in the log file and cue sheet.
The .cue file provides the absolute position of of all track markers, including the times of silent gaps between tracks. Since the tracks are combined into a single WAV file during the copying process, it is important to preserve this track information. The cue file also indicates where and how long the gaps between tracks are. Each track listed include the track name, the composer and performer information as entered in the metadata fields, and the index where the track began. Each track listing includes the Index 00, indicating where the gap began, and Index 01, indicating where the track began. If Index 00 and Index 01 are listed at the same time, there was no gap before the track.
We decided to proceed with the Copy Image and Cue Sheet option. This option results in a single WAV file, rather than a file for each track, which is the best practice for preservation and follows our practices for preservation copies of digitized audio. The audio file may be split into track files later, according to the information in the cue sheet. We anticipate that the split track files will be used for access copies and possibly for archivists managing arrangement and description
When working with material from a manuscript repository, the standard music metadata schema provided in EAC is rarely applicable. EAC provides options for “Artist”, “Performer”, “Composer”, and “Genre”, most of which will not usable in our context. However, this information is preserved in the log file, so it may be worthwhile to consider crosswalking our existing metadata schema to this music based schema.
For the moment, this metadata crosswalking is only an idea. The service is only working with the “CD Title” and “CD Artist” fields in EAC, as these fields appear as the filename for the log file. If left blank the log file will be named “Unknown Artist – Unknown Title.log”. For the service we are listing the CD Title as the media number and deleting “Unknown Artist” from the log filename post-creation. This is consistent with the file naming convention for log files created through other capture processes used by the service.
Hybrid CDs: The Next Problem
We’ve recently become aware of a new potential problem with optical media: hybrid CDs. These discs contain both digital audio and other data types and are formatted partially as a digital audio disc and partially as a data disc. One commercial instance of this might be a CD-DA that also contains a bonus music video. The service has not yet come across this type of hybrid disc in our collections, but we’re currently researching how to address it so we’ll be ready.