In October we completed the ingest of digitized materials for the Henry Kissinger project into Hydra and as a result, checked off a major milestone for the project. Ingest began in September 2014 and overall took 249 days to complete where for many weeks the ingest process was running 24/7 and required close monitoring.
The ingest process involved first creating metadata records in Ladybird from the original EAD files for the Kissinger collection (MS 1981 and MS 2004). This amounted to 16,161 Ladybird objects. Then as each of the 85 hard drives returned from the vendor, each drive had its contents validated through an automated quality control process and then transferred to temporary, network accessible storage for a manual quality control process. Once the digital files passed the quality control phase, they were matched up with the Ladybird object to create the complex parent/child relationship, essentially combining the metadata record with the digital files. This was performed by using the file name from each TIF image and extracting parts of the name to match it to the Ladybird record. Once a match was made, we imported the TIF and associated OCR file into Ladybird to create the ingest package to send to Hydra. Each ingest package contained the original TIF image, OCR file, a derivative JP2 and a derivative JPG. In addition, five metadata files were also attached which make up the Hydra object.
After completing ingest into Hydra, we then performed two independent audits to confirm the quantities of files matched correctly and each file’s checksum matched the original checksum in addition to the checksums calculated along the way to ensure file integrity.
Combining the counts of files for both MS 1981 and MS 2004, this is the end result:
|Folders with digitized content||15,710|
|Folders containing Audio/Video||157|
|Total TIF Images||1,530,433|
|Total OCR Files||1,202,920|
|Total Ladybird objects||1,546,594|
|Total Files Ingested into Hydra||13,542,899|
|Approximate checksums calculated||39,268,398|
|Estimated size of collection||95 Terabytes|
The following chart illustrates the growth by month from September 2014 through October 2015.