Before and After: Digital Media Accessioning Service

by Alice Prael

Before started in 2016 with a plan for centralizing born digital media accessioning.  There’s already a blog post about before, so I’ll skip ahead. “After” started in 2022 with a weird post-it map of born digital archiving workflows. Many thanks to my colleagues for bearing with those colorful semi-organized thoughts – at least there were fun illustrations! 

Blue, red, green, yellow and pink post-its with small writing and drawings related to born digital archives( e..g. "Born Digital on Media in Rare Books" with an image of a book open to a CD sleeve or "Email" with the letter icon and a computer screen open

A sample of the colorful post-its with illustrations from the Born Digital Archiving Mind Map

Some of these thoughts remained on a post-it, some were written into long term goals, and a few turned into work plans, projects, and documentation. One such post-it (okay, a few post-its) turned into a set of recommendations for the Digital Media Accessioning Service.  

In 2023 I worked with Mike Rush in Beinecke Operations and Richard Lynch in Library IT to develop SharePoint infrastructure to manage the service and track submissions of media.  The Born Digital Archives Working Group (now Advisory Group) and submitters to the Service provided feedback and recommendations. And now I’m working with my colleagues in Digital Preservation Services to identify areas of collaboration and capacity building for the service. 

A lot has changed, so here’s the before-and-after overview for the Digital Media Accessioning Service 

  • Before (2016): A Digital Accessioning Support Service for backlogs of digital media. 
  • After (2024): A Digital Media Accessioning Service that includes submitters to the service as collaborators and addresses bottlenecks to increase our overall capacity.
  • Before: By default, the service creates a disk image for all submitted media.  Submitters can request file transfers. 
  • After: By default, the service does a logical file transfer for most submitted media, exceptions for disk imaging floppy disks (due to Kryoflux capture process) and Macintosh computers manufactured prior to 2000 (due to known complications with the file system).  Submitters may still request a disk image.   
    • Impact: Logical file transfers do not include system files or deleted files (which we cannot provide access to and do not intend to preserve but cannot be excluded from a disk image capture), resulting in less digital storage for files that should not be acquired. 
    • Impact: Logical file transfers can more easily be characterized and managed in Preservica. 
    • Impact: In some unique cases, logical file transfers might break software dependencies required for long-term access via emulation. The Software Preservation and Emulation team works closely with Digital Preservation Services to connect with staff who might benefit from requesting disk imaging.
  • Before: Media label must be transcribed to the Title field in the metadata spreadsheet. 
  • After: If Title field is left blank the service will transcribe the media label.  If staff are reviewing media prior to Aspace/Preservica ingest, they will have the opportunity to review and change Titles.
    • Impact: lower barrier to submission, additional service labor 
  • Before: Required each piece of tangible media to be labeled with the Component Unique ID (CUID) 
  • After: The service will label media with the CUID 
    • Considerations: the service confirms the number of media in a box and labels each piece of media with CUID that matches the media type.  If the service is transcribing titles and labeling media with CUIDs, the submitter cannot control the order of CUIDs within a box.  If this is important to the submitter, they must either transcribe media labels to the Title field in the spreadsheet or label each piece of media with the CUID. 
    • Impact: Lower barrier to submission, additional service labor
  • Before: All media is photographed and scanned for Personally Identifiable Information (PII) unless requested otherwise.  
  • After: Photographing media and scanning for PII is now a choice in the submission survey.  This decision is applied to all media in that submission. 
    • Considerations: How will the PII scan and photograph be used?  Will the collection receive further processing, including more collection-specific searching?
    • Considerations: Can the file formats present be scanned for PII?  For example, PII scans will not identify personal information in audio files.
    • Impact: Service saves time and digital storage by not creating unnecessary files.
  • Before: Required the ArchivesSpace URL in submitted metadata spreadsheet (either parent archival object URL for the service to create child item records OR existing archival object to update) 
  • After: ArchivesSpace URL is not required until the spreadsheet is ready for Aspace import.  If submitter does not plan to review data before import, then Aspace URLs are required in the submitted spreadsheet. 
    • Impact: Staff can review files before deciding on descriptive structure, resulting in more integrated description of born digital materials.
  • Before: Submitter uses online survey to send the spreadsheet, then a working copy is saved in network storage until the data is imported to Aspace. 
  • After: Spreadsheet is submitted via SharePoint survey and is immediately stored in a Document Library, where it is available to both the BDS staff and the submitter.  Submitters can also add other library staff as “watchers”, which provides edit access to the spreadsheet.
    • Impact: Increased transparency and collaboration as data can be shared and updated collectively.  Also removes confusion of multiple copies held by each department. The spreadsheet in the SharePoint Document Library is the working copy until the service exports data as CSV and imports to ArchivesSpace.   
  • Before: Submitter does not have access to files until after ArchivesSpace description is updated and files are ingested to Preservica – unless special arrangements were requested in the submission survey or via email. 
  • After: When the submitter is notified that the submission is ready for their review, they can either 1) respond to the notification email and request download access to the entire submission or a specific set of CUIDs to be delivered via OneDrive or 2) Schedule a lab appointment to review materials on a machine in C131.
    • Consideration: This workflow assumes that material is usually deaccessioned at the media level.  If file-level appraisal becomes more common, we will need to automate this process. 
    • Impact: More opportunity for appraisal of born digital material before ingest to Preservica. 
  • Before: Submissions were tracked in a Box spreadsheet, where I manually created a new row for each box submitted to the service, including CollectionName_BoxNumber, SpreadsheetName, number of media and the date of submission.  I updated this Box Tracking Spreadsheet with dates for: files capture, media photographed, scanned for PII, spreadsheet imported to Aspace, files packaged for Preservica ingest, and Preservica ingest complete.  
  • After: Each submitted spreadsheet in the SharePoint Document Library has a Status that can be updated by any staff with access to that submission.  When a status is updated, SharePoint triggers a notification email to the submitter and DPS. 
    • Impact: The Box Tracking Spreadsheet provided an easy way to calculate the number of media and track how quickly materials were accessioned by the service.  The new model relies on Preservica and ArchivesSpace for this type of reporting.  
    • Impact: Submitter receives automatic notifications when a major workflow step is complete (received, in progress, ready for submitter review, ready for ArchivesSpace import, ready for Preservica packaging and ingest). Improved transparency and communication. 
  • Before *as of 2018: My position, Digital Archivist for Yale Special Collection, was in the Manuscript Unit, Beinecke Library Technical Services.  I spent 90% of my time on born digital materials in Beinecke Library, 10% on digital media accessioning for other repositories. 
  • After: As of September 2023, my position is Born Digital Specialist in Digital Preservation Services.  We are still identifying capacity limits and determining the best way for my position as Born Digital Specialist to support Yale Libraries.   
    • Impact: Digital Preservation Services is oriented towards staff training and system maintenance, which aligns with the Born Digital Specialist’s role.  
    • Impact: Better coverage for born digital support.  When the Born Digital Specialist is unavailable, staff can request support from digitalpreservation@yale.edu.   
    • Impact: Lauren Work serves as DPS Liaison for Media Accessioning Service, packaging and ingesting to Preservica for backlog and new acquisitions. 

 

Bonus Round: No Before, Just After 

These are new processes that the service can support but are not part of the standard workflow.

  • Instead of 1 row = 1 piece of media, do you want 1 row = 1 folder or 1 file?  We can use the Notes column in the spreadsheet to document the path to file or folder. 
    • Consideration: This is possible right now but may require development to become a standard workflow option.   
    • Deep-ish dive, feel free to skip: I’m using Excel concatenate function to create Teracopy command line scripts, using the file/folder list from the Notes column.  The Teracopy command line doesn’t include reporting function, so a log file is only created if an error or failed transfer occurs.  So far, the requests have been simple enough to manually separate the File Manifest (created by running Siegfried over the files on original media).  For more complex separation, I could run Siegfried on the post-transfer files in the Content folders instead. 

 

  • If staff need access to born digital content that can’t be accurately rendered using standard tools for legacy files (e.g. FTK Imager, FTK, QuickViewPlus) – Digital Preservation Services has support from the Software Preservation and Emulation team to emulate the computing environment needed to view the content.  The Emulation-as-a-Service Infrastructure (EaaSI) system can be used to provide staff with access to born digital files in an emulated computing environment with software dependencies installed.   
    • Consideration: Requesting emulation will often require additional research and trial-and-error testing to identify the correct combination of OS and software to support the content. 
    • Consideration: Although the EaaSI system is user-friendly, the original software presented in emulation may not be.  For example, if you request access to database files created in MS-DOS, plan for a learning curve to working with MS-DOS database software.