Category: Project Hydra

Royal Library Presents Chronos

The Royal Library (National Library of Denmark and Copenhagen University Library) recently released a new Hydra application named Chronos.

Details of the project can be found here: http://www.kb.dk/en/nb/afdelinger/db/index.html

The project spanned a little over two years during which the bulk of the work  was in establishing policies for long term digital preservation and then setting a strategic plan based on the policies.

The work then segued into the cost models to support the newly developed policies and strategic plan. Using a shared set of principle and guidelines from Collaboration to Clarify the Cost of Curation (http://www.4cproject.eu/), they developed a sustainable cost model for long term preservation of their digital assets.

Once they had policies, strategies and costs established, they moved their work into a more detailed level and focused on the metadata requirements for preservation. The focus was on event data to be stored in PREMIS and structure data to be stored in METS. This led to much more detailed discussions related to the discovery of the digital assets for public discovery as well as metadata required for creating internal reports for performing tasks related to digital preservation.

Once this work was complete, they moved into the process of specifications for the system. They selected Hydra as the best approach for digitally preserving millions of documents. The planning process started in June 2014 and continued through the end of October 2014. This past December work begun on the new system and the week of March 16, 2015 they will release videos and additional information to the Hydra community.

Indiana University and Northwestern University Libraries Receive Mellon Grant for Avalon

Hydra community:

I’m pleased to be able to announce that the Indiana University Libraries, in partnership with Northwestern University Library, have received a $750,000 grant from the Andrew W. Mellon Foundation to support work on the Avalon Media System project through January 2017.

This funding will help to support the following activities: 1) developing additional features and functionality for Avalon to better meet needs of collection managers and users; 2) conducting studies of use of audio and video collections by researchers in humanities disciplines to help ensure future support for scholarly use; 3) integrating the Spotlight exhibit tool with Avalon to allow librarians, archivists, and scholars to showcase and provide additional context for media items and collections; 4) developing and implementing a community-funded business and governance model to sustain ongoing support and development for Avalon; and 5) deploying Avalon in a hosted software-as-a-service model for use by institutions that need the functionality of Avalon but would prefer to utilize a cloud-based software-as-a-service option rather than support a locally hosted instance.

I’d like to offer thanks to the Hydra community for building and maintaining a solid technical foundation that enables systems such as Avalon to be built and to members of the Hydra community who have assisted with Avalon’s development by providing feedback on requirements and implementation experiences.

More information is available in a press release from Indiana University at http://news.indiana.edu/releases/iu/2015/03/mellon-grants-digital-preservation.shtml

Best,
Jon

Jon Dunn

Interim Assistant Dean for Library Technologies

Indiana University Bloomington Libraries

Princeton Hydra Release: Digital Archive of Latin American and Caribbean Ephemera

Recent Announcement from Princeton:

We’ve finally soft launched our first public application that is serving content from Hydra: http://lae.princeton.edu/. There is still some tweaking to do to Solr and the CSS, but we’re getting close. Staff have been adding data to this application at a rate of 150-300 items per month since for about 6 months now, and we expect to be at a consistent rate of 300/month or more by the summer.

This public interface has some features that may not be obvious to the average end-user:
* All of the images are served via IIIIF (e.g. http://libimages.princeton.edu/loris2/puls%2F0%2Fj%2F0%2Fg%2F1.jp2/full/75,/0/default.jpg)
* The individual catalog pages are displaying data drawn from IIIF Presentation manifests: http://lae.princeton.edu/catalog/0d7hm.jsonld (and, as you can see, also available as IIIF Manifests)
* All of the data is also available as RDF, e.g.: http://lae.princeton.edu/catalog/0d7hm.ttl

Hydra update: Spotlight Latest Version with Screen Casts and Updated Screen Shots

Spotlight is a Hydra head we are currently investigating as part of an Academic Repository project with Central ITS. Here’s the one sentence pitch that defines Spotlight:

Enable librarians, curators, and others who are responsible for digital collections to create attractive, feature-rich websites that highlight these collections.” – taken from GitHub

 

Recent Communication to the Hydra Community:

While we are long overdue for a community update on Spotlight, the team at Stanford (Jessie Keck, Chris Beer and Gary Geisler) has been heads down working hard for the past several months. As we near an end of our current development cycle we wanted to report on current status and goals, and share a view visuals.

The current round of development is focussed on three broad goals:
1. Building out an end-to-end, self-service workflow for creating a new Spotlight exhibit using items and collections in the Stanford Digital Repository. Because we have built Spotlight to be repository agnostic, the technical work to accomplish this goal is somewhat specific to Stanford’s digital library architecture. The code developed for this does not ship as part of Spotlight.  However, we hope that this can serve as an example and model for others to implement a repository-based self-service workflow for creating new exhibits.  We intend to document the workflow we’ve implemented for reference.
A demo of this workflow is now available on YouTube at:  http://youtu.be/ZyJ2wzzzunc
2. Enable the addition of items not stored in a formal repository system to a new or existing exhibit.   We refer to this feature set as “support for non-repository items”, although we likely want to re-label it.  This set of features is intended to make Spotlight useful for those institutions that don’t have a fully baked repository backend with which to integrate Spotlight, or for many good reasons may want to build exhibits from contents not stored in a repository.  It also includes the ability to augment any exhibit with non-repsitory items, for example a faculty member or curators local collection of images.  We have implemented two approaches:
  • Single image upload: Using a form an exhibit creator can upload a single image file from their local system and add a few simple metadata fields.  If exhibit-specific fields have been created for the exhibit these fields will also be available in the form. Upon submission, the single image and associated metadata is added to items available for building feature pages, and is indexed and available in search results and browse categories.
  • Bulk-add via CSV: A CSV template is provided to the exhibit creator to populate with a list of image URLs and associated metadata.  Upon submission of the CSV, the images are fetched over the web and copied, and indexed records are created for all items.  The bulk feature is pulling images in via the web, so exhibit creators can upload images to popular cloud services (box, dropbox, google drive, google images, flickr, etc.) or add any URI to a publicly available image.
This feature is nearly complete, and we’ll send a video demo out in the next few days.
3. Enhance the visual design and user experience to better support image-heavy exhibits.
The goal here has been to enhance the visual design to provide a more “museum-like” or visually oriented look and feel.  Our design team has developed a proposal for a variety of new elements and widgets to produce a more visual, immersive and interactive experience. The developers are just starting to implement these now, and certainly your feedback is welcome.  The initial design proposal can be seen here:
Of course there are a variety of tickets and features that we have added and will be adding that fall somewhat outside the scope of these high-level goals.  For example we have just recently added simple analytics for exhibits using the Google Analytics API – https://github.com/sul-dlss/spotlight/pull/942 .
We anticipate 2-3 weeks more of development on Spotlight and the next release will also include improved documentation and a project site (at something like spotlight.github.io – not claimed or built yet).
We’ll be back in touch soon with more frequent updates as we wind down this phase of development.
-Stu Snydman
****************************************
Stuart Snydman
Associate Director for Digital Strategy
Stanford University Libraries

Indiana University Receives NEH Grant for Digital Preservation using Hydra

The National Endowment for the Humanities recently awarded the Indiana University Libraries and WGBH Boston a grant to support the development of HydraDAM2. This preservation-oriented digital asset management system for time-based media will improve upon WGBH’s existing HydraDAM system and work seamlessly with the Avalon Media System for user access, among other features.

Both HydraDAM and the Avalon Media System grew from the Hydra community. Hydra is an open source technology framework that supports the creation of preservation and access applications for digital assets based on the Fedora repository system. A community of institutions known as the Hydra Partners works together to maintain the framework and create applications for local or shared use by libraries, archives, and cultural institutions. Both Indiana University and WGBH Boston are among the 25 Hydra Partner institutions. Indiana University is collaborating with Northwestern University on the development of the Avalon Media System and WGBH developed the original HydraDAM system with help from the Data Curation Experts group.

[complete article]

HydraDam is based on the popular Hydra application Sufia. You can view some interesting examples of institutions using Sufia for digital preservation here:

Penn State: ScholarSphere

Notre Dame: CurateND

Case Western: Digital Case

 

Hydra Project

 

ProjectHydra.org
Avalon Media Systems

 

Hydra & DuraSpace: agreement for project services

Recent news from the Hydra community, sent by Tom Cramer, Chief Technology Strategist, Stanford University Library:

HydraNauts,

I am pleased to share the news that Hydra has officially entered an agreement for DuraSpace to provide banking, financial and marketing/communication services for 2015, to help support and advance the Hydra Project.

As we have seen some remarkable growth in the last two years, it has become increasingly apparent that a service provider could help meet  the project’s growing administrative needs, and help position it for even further expansion. After canvasing the landscape for potential host organizations, DuraSpace appeared as a natural fit for the project, given its overlapping membership, stewardship of other vibrant projects (in particular Fedora), and a shared vision about the future potential of Hydra. 

After discussion over several months on the Hydra Partners list and within Hydra Steering, a unanimous vote among the Partners ratified this direction last month. This marks a significant step forward for the Hydra Project in terms of maturity, and positions us well for growth in the coming years. One of the first tasks for us will be to launch a small, volunteer fundraising campaign for 2015; more on that anon!

On behalf of the Hydra Partners and Steering Group,

– Tom

New Hydra Adopter: Chemical Heritage Foundation (CHF)

Recent post to the Hydra Community:

Hello!

We wanted to let the Hydra community know that the Chemical Heritage Foundation (CHF) in Philadelphia has decided to adopt Hydra as our repository solution. CHF is a library, museum and center for scholars, and we’re interested in building a central repository for our diverse digital assets (photographs, books, archival collections, fine art, oral histories, and museum objects). We’re a small cultural heritage institution with a digital collections team of three (Michelle=Curator, Anna=Developer and Cat=Metadata).

Our plan is to begin with Sufia running Fedora 4 to create a basic image collection for our photographs and 2D book scans. We’ll then be exploring more complicated project phases, which will include replacing or integrating with our museum’s CMS, integrating archival objects and EAD finding aids that currently live in ArchivesSpace, and ingesting complex objects with unique issues, like our oral histories.  We’re also really interested in exploring Spotlight as an exhibition tool and in the possibility of future integration with Archivematica (or something similar) to develop preservation functionality.

We wanted to thank Data Curation Experts and Temple University for talking with us during our decision-making phase! We’re very excited to get involved in the Hydra community!

With thanks,

Michelle DiMeo

Curator of Digital Collections

Chemical Heritage Foundation

New Hydra Partner: University of Alberta

We are delighted to announce that the University of Alberta has become the latest formal Hydra Partner.  The University of Alberta has well over a decade of experience in large-scale digitization and repository projects, and has a strong team of librarians, developers, data curators and other experts migrating their existing systems to what they are calling “Hydra North.”

In their Letter of Intent, the University of Alberta says that they are committed to using their local needs as pathways to contribute to the Hydra community. Their primary areas of focus in this will be research data management, digital archives, and highly scalable object storage.

http://projecthydra.org/

code4lib 2015

C4L-ajamie

450 people from around the world gathered in Portland Oregon last week for the 10th annual code4lib library technology conference.

On Monday, approximately 18 pre-conferences were held in half and and full day sessions mostly comprised of demos, tutorials and discussion groups. I attended a morning session on linked data lead by Tom Johnson of DPLA and Karen Estlund of the University of Oregon. As a developer, the demonstration of the ruby gem ActiveTriples was particularly interesting in its ability to quickly model content into RDF classes and properties that can seamlessly connect to fedora 4 persistence or any extensible back end.

In the afternoon I attended a GeoBlacklight demo lead by Jack Reed and Darren Hardy of Stanford. The Stanford GeoBlacklight is a leading map collection interface that allows for spacial search, presentation, and discovery based on the development of metadata schemas, conversion workflows, and interface presentation components. The workshop focused on using the VirtualBox virtual machine and Vagrant setup environment to bring up an instance of geoblacklight in minutes.

On Tuesday the conference proper started with a keynote by Selena Deckelman. Her talk focused on the importance of leading the coding community based on principles of inclusion of beginners and marginal groups. The presentations on Tuesday expanded on that theme with talks focused on users, teams, developers and experiences in dealing with library technology challenges.

The presentations of Wednesday were more technically focused. Thursday morning a closing keynote was given by Andromeda Yelton who encouraged building systems with tools designed to best satisfy the “wanderlust” behind user’s and patrons’s drive to discovery. In between the 20 minute presentations were 2 hour long lighting talk session comprised of 5 minutes talks by 12 people. I thought the keynotes nicely framed the conference, the lightning talks were a great way to digest and get a pulse on what people were working on. As a developer I was particularly interested the the presentation of tools providing facility, such as Kevin Clarke’s presentation of Packer, a dev-opts tool for deploying to virtual machines, and Stanford’s OEmbed service for offering embeddable links to their digital collections, and a presentation by Stanford’s Rob Sanderson and Naomi Dushay describing the experience attempting to integrate their ILS, digital collections, and discovery indexes.

On Thursday afternoon and Friday, I attended working groups focused on fedora 4, hydra’s support of fedora 4, content modeling, and the linked data platform. The discussions were vigorous, and it was a beneficial mental exercise to spin out the various content model concepts of collection/work/file, the distinction between the “aggregates” and “members” predicate, and how to use the LDP Direct and Indirect Containers to deal with assets, rights, and ordering proxies, although I’m afraid not much was resolved. But DPLA (Digital Public Library of America) appears very interested in furthering these concepts into usable models that may promise to be a great step forward in furthering metadata discovery and interoperability.

All in all worthwhile, keeping an eye on next year’s conference, venue TBD.

Henry Kissinger Project – Ingest Statistics

This is just a brief update to offer some ingest statistics related to the Henry Kissinger project. The digitized project will contain approximately 1,700,000 digital objects from approximately 12,800 folders.

The process of ingest includes both manual and automated processes. The Digital Library Programming group is responsible for the automated steps which basically include the creation of a Ladybird object and then publishing that object to Hydra. At this time, all objects are being ingested in a manner that prevents them from being exposed in the public Hydra interface (FindIT.library.yale.edu). The plan is to “turn on” the collection all at once, which is a better approach when a collection is very large and very complex. Otherwise, researchers may have a difficult time using the collection if materials were made available a little at a time, in sometimes what would seem like a random order.

As of Feb 16:

  • 339,041 – the number of objects ingested into Hydra
  • 4,266 – the number of folders ingested out of the approximate 12,800
  • 7 – the number of digital files that makeup an object in Hydra
  • 2,377,553 – the actual number of files ingested into Hydra
  • 792,655 – total objects ingested into Hydra
  • 5,548,585 – total number files currently in Hydra
  • 10.856 seconds – the average time it takes an object to ingest into Hydra

Something to consider with the last statistic, which is actually the one we focus on the most. At the current rate, time to ingest the entire collection is approximately 213 days. For each 1/10th of a second that this rate fluctuates, the completion time increases/decreases by roughly 31 hours. If ingest was to suddenly start taking 11.8 seconds, it would push the approximate completion time to 232 days.