Completion of Kissinger Ingest

In October we completed the ingest of digitized materials for the Henry Kissinger project into Hydra and as a result, checked off a major milestone for the project. Ingest began in September 2014 and overall took 249 days to complete where for many weeks the ingest process was running 24/7 and required close monitoring.

The ingest process involved first creating metadata records in Ladybird from the original EAD files for the Kissinger collection (MS 1981 and MS 2004). This amounted to 16,161 Ladybird objects. Then as each of the 85 hard drives returned from the vendor, each drive had its contents validated through an automated quality control process and then transferred to temporary, network accessible storage for a manual quality control process. Once the digital files passed the quality control phase, they were matched up with the Ladybird object to create the complex parent/child relationship, essentially combining the metadata record with the digital files. This was performed by using the file name from each TIF image and extracting parts of the name to match it to the Ladybird record. Once a match was made, we imported the TIF and associated OCR file into Ladybird to create the ingest package to send to Hydra. Each ingest package contained the original TIF image, OCR file, a derivative JP2 and a derivative JPG. In addition, five metadata files were also attached which make up the Hydra object.

After completing ingest into Hydra, we then performed two independent audits to confirm the quantities of files matched correctly and each file’s checksum matched the original checksum in addition to the checksums calculated along the way to ensure file integrity.

Combining the counts of files for both MS 1981 and MS 2004, this is the end result:

Total Folders 16,161
Folders with digitized content 15,710
PDF files 15,710
Folders containing Audio/Video 157
Total TIF Images 1,530,433
Total OCR Files 1,202,920
Total Ladybird objects 1,546,594
Total Files Ingested into Hydra 13,542,899
Approximate checksums calculated 39,268,398
Estimated size of collection 95 Terabytes

 

The following chart illustrates the growth by month from September 2014 through October 2015.

Kissinger Ingest Graph

Project update: Digital collections search interface

Central ITS will be conducting the first of three load tests on the enhanced interface for digital collections on Friday July 17th between 1:30pm and 5pm. They will use a service called LoadRunner which determines the breaking point of an application by emulating real use by a number of concurrent users. The second two tests will take place between July 27 and July 30. I will follow up once these dates and times are confirmed.

These tests on the enhanced interface for digital collections are not expected to impact the current digital collections interface. Library IT will be monitoring the current digital collections interface on 7/17 for service disruptions.

I write to you regarding some testing on the enhanced interface for digital collections that may impact our current digital collections discovery service (http://findit.library.yale.edu). The enhanced interface for digital collections is a version of this digital collections discovery service, with features, functionality and security developed for use with more restricted digital materials. Like our unified discovery service,Quicksearch, both the digital collections interface and the enhanced version are powered by Blacklight.
Curious about what’s in the Yale University Library digital collections search? Here’s some clocks made by Paul Revere. We also have fire insurance maps of Seymour, CT– and much more! You can learn more about the Library’s discovery services (Articles+Quicksearch and digital collections search) at the Rediscover Discovery forum in August (Tues 18th and Thurs 20th). More information on that coming soon.

If you have questions about this work, or notice any issues with http://findit.library.yale.edu, please let me know.

 

Mike Friscia

On behalf of the FindIT Project Implementation team:

Osman Din

Eric James

Tracy MacMath

Anju Meenattoor

Bob Rice

Lakeisha Robinson

Steelsen Smith

Kalee Sprague

Yale Joins the IIIF Consortium

On Tuesday [June 16], eleven world-leading institutions agreed to form the IIIF Consortium, a member organization dedicated to sustaining and advancing the International Image Interoperability Framework (IIIF). The consortium will support the work of IIIF by pooling and allocating funding from members for exposing content via IIIF; doing outreach, training, and advocacy to grow the community; maintaining and elaborating on the IIIF technical specifications; providing catalytic support for IIIF-compatible software development; helping coordinate the IIIF community, and more.

[Read more]

From the Hydra News blog: HydraDAM 2 update

posted Fri, 22 May 2015 by Michael Friscia

Indiana University and WGBH recently presented their plans for the grant funded HydraDAM 2 project . Some interesting bullets from their presentation:

HydraDAM 1 came from a need for WGBH to migrate off the vendor product Artesia which was heading in a new direction

Indian University’s use case is to ingest 10 Terabytes per day for 4 years for a total of 6.6 Petabytes of master and use copy video files along with associated files for preservation into HydraDAM 2

HydraDam 1 is too slow for ingest so ingest is handled externally

HydraDam2 will use two different storage system models with Fedora 4 managing both online/nearline and offline tape copies

Out of region copies are out of scope for the size of the collection going in, however, IU is a DPN member and plans to use that for high risk items. Currently they are in the process of setting policies and preservation levels associated with the content.

Preservation services to be offered in HydraDam2 include:

Storage and retrieval of files

Scheduled fixity checks and file characterization on demand

Auditing based on Fedora 4

Reporting

Media migration (from one storage solution to another storage solution)

Format migration for risk of obsolescence

There is a working version of Avalon using Fedora 4

This was a preliminary presentation. IU and WGBH will be giving a detailed presentation at the upcoming Open Repositories conference in June.

DPLA joins the Hydra Partners

We are delighted to announce that the Digital Public Library of America (DPLA) has become the latest formal Hydra Partner.  In their Letter of Intent Mark Matienzo, DPLA’s Director of Technology, writes of their “upcoming major Hydra project, generously funded by the IMLS, and in partnership with Stanford University and Duraspace, [which] focuses on developing an improved set of tools for content management, publishing, and aggregation for the network of DPLA Hubs. This, and other projects, will allow us to make contributions to other core components of the Hydra stack, including but not limited to Blacklight, ActiveTriples, and support for protocols like IIIF and ResourceSync. We are also interested in continuing to contribute our metadata expertise to the Hydra community to ensure interoperability across our communities.”

IMLS funds collaborative development of “Hydra-in-a-Box”

Digital Public Library of America – Boston, MA

Grant Program: National Leadership Grants

Category: National Digital Platform

Award Amount: $1,999,897; Matching Amount: $2,000,686

 

The Digital Public Library of America (DPLA), Stanford University, and DuraSpace will foster a greatly expanded network of open-access, content-hosting “hubs” that will enable discovery and interoperability, as well as the reuse of digital resources by people from this country and around the world. At the core of this transformative network are advanced digital repositories that not only empower local institutions with new asset management capabilities, but also connect their data and collections. Currently, DPLA’s hubs, libraries, archives, and museums more broadly use aging, legacy software that was never intended or designed for use in an interconnected way, or for contemporary web needs. The three partners will engage in a major development of the community-driven open source Hydra project to provide these hubs with a new all-in-one solution, which will also allow countless other institutions to easily join the national digital platform.

 

http://www.imls.gov/news/2015_lb21_nlg_march_announcement.aspx

 

Royal Library Presents Chronos

The Royal Library (National Library of Denmark and Copenhagen University Library) recently released a new Hydra application named Chronos.

Details of the project can be found here: http://www.kb.dk/en/nb/afdelinger/db/index.html

The project spanned a little over two years during which the bulk of the work  was in establishing policies for long term digital preservation and then setting a strategic plan based on the policies.

The work then segued into the cost models to support the newly developed policies and strategic plan. Using a shared set of principle and guidelines from Collaboration to Clarify the Cost of Curation (http://www.4cproject.eu/), they developed a sustainable cost model for long term preservation of their digital assets.

Once they had policies, strategies and costs established, they moved their work into a more detailed level and focused on the metadata requirements for preservation. The focus was on event data to be stored in PREMIS and structure data to be stored in METS. This led to much more detailed discussions related to the discovery of the digital assets for public discovery as well as metadata required for creating internal reports for performing tasks related to digital preservation.

Once this work was complete, they moved into the process of specifications for the system. They selected Hydra as the best approach for digitally preserving millions of documents. The planning process started in June 2014 and continued through the end of October 2014. This past December work begun on the new system and the week of March 16, 2015 they will release videos and additional information to the Hydra community.

Indiana University and Northwestern University Libraries Receive Mellon Grant for Avalon

Hydra community:

I’m pleased to be able to announce that the Indiana University Libraries, in partnership with Northwestern University Library, have received a $750,000 grant from the Andrew W. Mellon Foundation to support work on the Avalon Media System project through January 2017.

This funding will help to support the following activities: 1) developing additional features and functionality for Avalon to better meet needs of collection managers and users; 2) conducting studies of use of audio and video collections by researchers in humanities disciplines to help ensure future support for scholarly use; 3) integrating the Spotlight exhibit tool with Avalon to allow librarians, archivists, and scholars to showcase and provide additional context for media items and collections; 4) developing and implementing a community-funded business and governance model to sustain ongoing support and development for Avalon; and 5) deploying Avalon in a hosted software-as-a-service model for use by institutions that need the functionality of Avalon but would prefer to utilize a cloud-based software-as-a-service option rather than support a locally hosted instance.

I’d like to offer thanks to the Hydra community for building and maintaining a solid technical foundation that enables systems such as Avalon to be built and to members of the Hydra community who have assisted with Avalon’s development by providing feedback on requirements and implementation experiences.

More information is available in a press release from Indiana University at http://news.indiana.edu/releases/iu/2015/03/mellon-grants-digital-preservation.shtml

Best,
Jon

Jon Dunn

Interim Assistant Dean for Library Technologies

Indiana University Bloomington Libraries

Indiana University and Northwestern University Release Avalon 3.3

Indiana University and Northwestern University are pleased to announce Avalon Media System 3.3. Release 3.3 adds the following capabilities:

  • MARC metadata Import
  • Ingestion of pre-transcoded derivatives with multiple quality levels
  • Script for recovering disk space taken up by temporary Matterhorn files
  • UI Improvements and Bug fixes

Users of Avalon 3.2 can take advantage of these new features by Upgrading Avalon 3.2 to Avalon 3.3.

For a more comprehensive list of changes, see the 3.3 release notes.

For more details on each of these new features, visit the What’s New in Avalon 3.3 wiki page: https://wiki.dlib.indiana.edu/display/VarVideo/What%27s+New+in+Avalon+3.3

Please feel free to try out Avalon 3.3 on our public test server (http://pawpaw.dlib.indiana.edu) before installation. Installation options include virtual machine image, manual installation, and source code installation. More information on all available options can be found on the Avalon web site’s Download page: http://www.avalonmediasystem.org/download

We welcome your feedback on Avalon 3.3 via the avalon-discuss-l discussion list. Join the discussion list at http://www.avalonmediasystem.org/connect

Best regards,

Jon

Jon Dunn
Interim Assistant Dean for Library Technologies

Indiana University Bloomington Libraries

Princeton Hydra Release: Digital Archive of Latin American and Caribbean Ephemera

Recent Announcement from Princeton:

We’ve finally soft launched our first public application that is serving content from Hydra: http://lae.princeton.edu/. There is still some tweaking to do to Solr and the CSS, but we’re getting close. Staff have been adding data to this application at a rate of 150-300 items per month since for about 6 months now, and we expect to be at a consistent rate of 300/month or more by the summer.

This public interface has some features that may not be obvious to the average end-user:
* All of the images are served via IIIIF (e.g. http://libimages.princeton.edu/loris2/puls%2F0%2Fj%2F0%2Fg%2F1.jp2/full/75,/0/default.jpg)
* The individual catalog pages are displaying data drawn from IIIF Presentation manifests: http://lae.princeton.edu/catalog/0d7hm.jsonld (and, as you can see, also available as IIIF Manifests)
* All of the data is also available as RDF, e.g.: http://lae.princeton.edu/catalog/0d7hm.ttl