Email Task Force Report

In early FY19 an Email Archiving task force was convened by the Born Digital Archives Working Group (BDAWG) to explore the topic of email archiving at Yale.  The task force included archivists and librarians from throughout the Yale University Libraries and Museums (YUL/M) and produced the Born Digital Archives Working Group (BDAWG)Email Archiving Task Force: Final Report.  This product aims to provide an analysis of current tools and workflows around email archiving practices from throughout the field, identify requirements, and explore workflow and tool combinations for use by units at Yale.  Of specific interest has been determining which, from the current landscape of existing tools and approaches, could be adopted by units within YUL/M to readily integrate with existing tools and services.

With a focus on the areas of pre-acquisition, acquisitions, accessioning, and preservation, the task force began the process of gathering information about current tools and processes via an environmental scan.  The scan included interviews with those currently involved with email archiving, both from within and outside of the institution. The gathered information highlighted a diverse set of tools with a subset, of the most commonly used, emerging from across the responses.  The need for well-documented and iterative testing of such tools was also expressed.

The elicitation of core requirements began with the creation of user stories, outlining the actions of key personas in each area of focus.  Through discussion around these predicted tasks and summary of user interactions, the group identified 30 in-scope core requirements across the categories of pre-acquisition, acquisitions, accessioning, preservation, and general requirements.  With the requirements in hand, we turned to the formation of actionable workflows to satisfy each.

Parallel to the requirements elicitation process, and building on the product of the environmental scan,  a summary examination of tools suited for performing various aspects of email archiving was compiled. With a base knowledge of the existing tools and their functionality, each was assessed against the group’s core requirements with the goal of identifying tools that would allow for the full set of requirements to be satisfied, and be subject to in-depth testing.

A small working group was charged with further evaluating the ePADD, Forensic Toolkit (FTK), and Aid4Mail applications.  These tools were identified for testing based on the workflows observed via the environmental scan as being well-suited to handle the flow of data through each stage of the process.  Following additional testing the group formulated process workflow diagrams, modeling how a staff member might undertake the processes of pre-acquisition, acquisitions, accessioning, and preservation in a manner that adheres to the core requirements.

To best facilitate the testing of identified tools and processes, the task force will continue to meet to discuss real-world examples from within the institution’s collections.  Towards providing a consistent and accessible set of tools, work on the creation of a centrally supported suite of software for staff working on born-digital collections has commenced with task force members and LibraryIT.  The full details of our processes and findings are available in the full report.

 

Emulating Amnesia

By Alice Prael and Ethan Gates

In 1986 the science fiction author, Thomas M. Disch published the text based video game titled “Amnesia.”  The game begins when the player’s character awakens in a hotel room in midtown Manhattan with no memory. The character must reveal his own life story in order to escape an attacker and prove he never killed anyone in Texas.  The game was produced for IBM PC, Apple II and Commodore 64.

Cover of the original Amensia video game. Man in white tuxedo with confused expression stands in front of bright city background of billboards and shops.In 1991 the Beinecke Rare Book and Manuscript Library acquired the papers of Thomas M. Disch; including his writings, correspondence, and ten 5.25-inch floppy disks, containing multiple versions of the video game titled “Amnesia”.

In 2019 the Digital Archivist for Yale Special Collections, that’s me, Alice Prael, was searching for born digital archival material to test emulating legacy operating systems – like IBM PC, Apple II and Commodore 64. Funnily enough, the collection of born digital material I immediately remembered was titled “Amnesia”.  

This fascinating game preserves a moment in video game development from the mid 1980s and presents an accurate reflection of 1986 midtown Manhattan, complete with shop names and correct opening and closing times.  The production of the game for three different operating systems, makes it a great example for testing emulation capabilities. Fortunately, the content from these floppy disks had already been captured by the Digital Accessioning Support Service (DASS) in 2016. Unfortunately, the initial content capture was not entirely successful. The DASS captured the Kryoflux stream files and when disk imaging failed twice the DASS moved onto the next disk.  

Quick Jargon Check:  A disk image is a file that contains the contents and structure of a disk, it’s an exact copy of the disk without the physical carrier. When disk imaging is successful, the image can be mounted on your computer and opened like an attached flash drive to view the file system and contents.

Kryoflux stream files capture the magnetic flux on a floppy disk – which can then be interpreted into one of the 29 disk image formats.  The stream files cannot be mounted and viewed like a file system, they can only be interpreted through the Kryoflux software. However, once Kryoflux interprets the stream files into the correct image format, that disk image can then be mounted to view the files.  Now back to our story.

Since the stream files serve as a preservation copy, the DASS only tries two disk image formats before moving on.  In order to use Amnesia as a test case, the stream files had to be re-interpreted into the one correct disk image format out of the 29 formats supported by Kryoflux – but which one?  I started with the Commodore 64 version of the game. 14 Kryoflux disk image formats begin with CBM (Commodore Business Machines) so I started there. After some initial research to learn the history of image formats like “CMB V-MAX!” and “CBM Vorpal” I decided it would be much faster to try them all and see which ones worked.  I created 14 disk images and attempted to mount each one to view the contents. 13 of them were mountable disk images. The game’s reliance on legacy operating systems makes it an ideal case for access via emulation, but that also means that the content isn’t readable like a normal file system full of text files. When I loaded the disk images I couldn’t make out full sentences, but a few of the mounted disk images revealed fully formed words like “hat”,“hamburger”, and “umbrella” – already proving more successful than the initial disk imaging in 2016. 

From here I handed the disk images off to the Software Preservation Analyst, Ethan Gates, so I’ll let him tell the rest of the story.

 

Since I was largely unfamiliar with Commodore computing before this test case, I was slightly intimidated by the number of even partially-mountable images to test. But I had the same realization as Alice – rather than diving straight into the deep end of trying to understand each image format, it was faster to just try to plug each image into an emulator and see if the program could narrow the field for us. (Emulators are applications that mimic the hardware and software of another computer system – they can let you run Windows 95 on a Mac, or an Atari on your Intel PC, or much much more)

So, in a testing session with Claire Fox (a student in NYU’s Moving Image Archiving and Preservation M.A. program and our summer intern in Digital Preservation Services), we fired up VICE, an open source Commodore 64 emulator that we also use for the EaaSI project. When “attaching” a disk image (simulating the experience of inserting a floppy disk into an actual Commodore computer), VICE automatically gives a sense of whether the emulator can read the contents of that image:Screenshot of emulator

Out of all the disk images Alice provided, VICE only seemed able to see the “Amnesia” program on 3 of them (“Amnesia” was distributed by Electronic Arts, hence the labeling). One (“CBM DOS”) simply froze on an image of the EA logo when attached and run. Two others –  both flavors of “CBM GCR” – successfully booted into the game.

Screenshot of the Amnesia game introduction pageWe proceeded a ways into the game (until getting stumped by the first puzzle, at least) in order to be confident that the content and commands were working, and to compare whether the two images seemed to behave the same way. They did, which meant it was time to finally do some proper research and figure out the difference between these two formats that Kryoflux offered, and which one we should move forward with using for emulation.

Per the Kryoflux and VICE manuals, we learned that “CBM GCR” (or “G64”) disk image format was originally designed specifically for use with Commodore emulators by the teams behind VICE and CCS64 (another popular application). It is a flexible, “polymorphic” format whose main benefit is that it can help foil a number of copy protection methods – tricks that publishers like EA used to prevent users from copying their commercial floppies over to blank disks – the 1980s version of digital right management (DRM), essentially. The second CBM GCR option is the same format “plus mastering data needed for rewriting” – near as I can tell, this is only necessary for writing the disk image back out to a “new” 5.25-inch floppy, which I doubt will be in Yale’s use case. We’ll proceed with our first CBM GCR disk images for offering access to the Commodore 64 version of “Amnesia”.

This is very exciting progress, and we have been able to run “Amnesia” in a web browser using VICE in the Emulation-as-a-Service platform as well. Part of the fun moving forward will be deciding exactly what it should look like when presented to Beinecke patrons: VICE can actually recreate not just the Commodore 64, but a large range of other 8-bit Commodore models, as well as a number of aesthetic tweaks recreating a CRT display (brightness, contrast, scan lines, etc.) all of which can slightly alter the game’s appearance (OK, the difference is very slight with a text-based game, but still). VICE’s default options clearly do the heavy lifting to bring Disch’s work to life, but how important are these choices for design and context?

A further challenge will be working with the versions of “Amnesia” for systems beyond the Commodore. Kryoflux’s available formats for IBM PC and Apple II disk images do not handle EA’s copy protection schemes as well as their Commodore options, and so far we have not been able to create a usable disk image for either. It would be fascinating to be able to jump back and forth between multiple versions of the game in emulation to see how the text may have subtly changed, but that will require more investigation into properly converting emulatable copies from the preservation stream files.