New Hydra Adopter: Chemical Heritage Foundation (CHF)

Recent post to the Hydra Community:

Hello!

We wanted to let the Hydra community know that the Chemical Heritage Foundation (CHF) in Philadelphia has decided to adopt Hydra as our repository solution. CHF is a library, museum and center for scholars, and we’re interested in building a central repository for our diverse digital assets (photographs, books, archival collections, fine art, oral histories, and museum objects). We’re a small cultural heritage institution with a digital collections team of three (Michelle=Curator, Anna=Developer and Cat=Metadata).

Our plan is to begin with Sufia running Fedora 4 to create a basic image collection for our photographs and 2D book scans. We’ll then be exploring more complicated project phases, which will include replacing or integrating with our museum’s CMS, integrating archival objects and EAD finding aids that currently live in ArchivesSpace, and ingesting complex objects with unique issues, like our oral histories.  We’re also really interested in exploring Spotlight as an exhibition tool and in the possibility of future integration with Archivematica (or something similar) to develop preservation functionality.

We wanted to thank Data Curation Experts and Temple University for talking with us during our decision-making phase! We’re very excited to get involved in the Hydra community!

With thanks,

Michelle DiMeo

Curator of Digital Collections

Chemical Heritage Foundation

New Hydra Partner: University of Alberta

We are delighted to announce that the University of Alberta has become the latest formal Hydra Partner.  The University of Alberta has well over a decade of experience in large-scale digitization and repository projects, and has a strong team of librarians, developers, data curators and other experts migrating their existing systems to what they are calling “Hydra North.”

In their Letter of Intent, the University of Alberta says that they are committed to using their local needs as pathways to contribute to the Hydra community. Their primary areas of focus in this will be research data management, digital archives, and highly scalable object storage.

http://projecthydra.org/

code4lib 2015

C4L-ajamie

450 people from around the world gathered in Portland Oregon last week for the 10th annual code4lib library technology conference.

On Monday, approximately 18 pre-conferences were held in half and and full day sessions mostly comprised of demos, tutorials and discussion groups. I attended a morning session on linked data lead by Tom Johnson of DPLA and Karen Estlund of the University of Oregon. As a developer, the demonstration of the ruby gem ActiveTriples was particularly interesting in its ability to quickly model content into RDF classes and properties that can seamlessly connect to fedora 4 persistence or any extensible back end.

In the afternoon I attended a GeoBlacklight demo lead by Jack Reed and Darren Hardy of Stanford. The Stanford GeoBlacklight is a leading map collection interface that allows for spacial search, presentation, and discovery based on the development of metadata schemas, conversion workflows, and interface presentation components. The workshop focused on using the VirtualBox virtual machine and Vagrant setup environment to bring up an instance of geoblacklight in minutes.

On Tuesday the conference proper started with a keynote by Selena Deckelman. Her talk focused on the importance of leading the coding community based on principles of inclusion of beginners and marginal groups. The presentations on Tuesday expanded on that theme with talks focused on users, teams, developers and experiences in dealing with library technology challenges.

The presentations of Wednesday were more technically focused. Thursday morning a closing keynote was given by Andromeda Yelton who encouraged building systems with tools designed to best satisfy the “wanderlust” behind user’s and patrons’s drive to discovery. In between the 20 minute presentations were 2 hour long lighting talk session comprised of 5 minutes talks by 12 people. I thought the keynotes nicely framed the conference, the lightning talks were a great way to digest and get a pulse on what people were working on. As a developer I was particularly interested the the presentation of tools providing facility, such as Kevin Clarke’s presentation of Packer, a dev-opts tool for deploying to virtual machines, and Stanford’s OEmbed service for offering embeddable links to their digital collections, and a presentation by Stanford’s Rob Sanderson and Naomi Dushay describing the experience attempting to integrate their ILS, digital collections, and discovery indexes.

On Thursday afternoon and Friday, I attended working groups focused on fedora 4, hydra’s support of fedora 4, content modeling, and the linked data platform. The discussions were vigorous, and it was a beneficial mental exercise to spin out the various content model concepts of collection/work/file, the distinction between the “aggregates” and “members” predicate, and how to use the LDP Direct and Indirect Containers to deal with assets, rights, and ordering proxies, although I’m afraid not much was resolved. But DPLA (Digital Public Library of America) appears very interested in furthering these concepts into usable models that may promise to be a great step forward in furthering metadata discovery and interoperability.

All in all worthwhile, keeping an eye on next year’s conference, venue TBD.

Henry Kissinger Project – Ingest Statistics

This is just a brief update to offer some ingest statistics related to the Henry Kissinger project. The digitized project will contain approximately 1,700,000 digital objects from approximately 12,800 folders.

The process of ingest includes both manual and automated processes. The Digital Library Programming group is responsible for the automated steps which basically include the creation of a Ladybird object and then publishing that object to Hydra. At this time, all objects are being ingested in a manner that prevents them from being exposed in the public Hydra interface (FindIT.library.yale.edu). The plan is to “turn on” the collection all at once, which is a better approach when a collection is very large and very complex. Otherwise, researchers may have a difficult time using the collection if materials were made available a little at a time, in sometimes what would seem like a random order.

As of Feb 16:

  • 339,041 – the number of objects ingested into Hydra
  • 4,266 – the number of folders ingested out of the approximate 12,800
  • 7 – the number of digital files that makeup an object in Hydra
  • 2,377,553 – the actual number of files ingested into Hydra
  • 792,655 – total objects ingested into Hydra
  • 5,548,585 – total number files currently in Hydra
  • 10.856 seconds – the average time it takes an object to ingest into Hydra

Something to consider with the last statistic, which is actually the one we focus on the most. At the current rate, time to ingest the entire collection is approximately 213 days. For each 1/10th of a second that this rate fluctuates, the completion time increases/decreases by roughly 31 hours. If ingest was to suddenly start taking 11.8 seconds, it would push the approximate completion time to 232 days.