Category: Project News

Indiana University Receives NEH Grant for Digital Preservation using Hydra

The National Endowment for the Humanities recently awarded the Indiana University Libraries and WGBH Boston a grant to support the development of HydraDAM2. This preservation-oriented digital asset management system for time-based media will improve upon WGBH’s existing HydraDAM system and work seamlessly with the Avalon Media System for user access, among other features.

Both HydraDAM and the Avalon Media System grew from the Hydra community. Hydra is an open source technology framework that supports the creation of preservation and access applications for digital assets based on the Fedora repository system. A community of institutions known as the Hydra Partners works together to maintain the framework and create applications for local or shared use by libraries, archives, and cultural institutions. Both Indiana University and WGBH Boston are among the 25 Hydra Partner institutions. Indiana University is collaborating with Northwestern University on the development of the Avalon Media System and WGBH developed the original HydraDAM system with help from the Data Curation Experts group.

[complete article]

HydraDam is based on the popular Hydra application Sufia. You can view some interesting examples of institutions using Sufia for digital preservation here:

Penn State: ScholarSphere

Notre Dame: CurateND

Case Western: Digital Case

 

Hydra Project

 

ProjectHydra.org
Avalon Media Systems

 

New Hydra Adopter: Chemical Heritage Foundation (CHF)

Recent post to the Hydra Community:

Hello!

We wanted to let the Hydra community know that the Chemical Heritage Foundation (CHF) in Philadelphia has decided to adopt Hydra as our repository solution. CHF is a library, museum and center for scholars, and we’re interested in building a central repository for our diverse digital assets (photographs, books, archival collections, fine art, oral histories, and museum objects). We’re a small cultural heritage institution with a digital collections team of three (Michelle=Curator, Anna=Developer and Cat=Metadata).

Our plan is to begin with Sufia running Fedora 4 to create a basic image collection for our photographs and 2D book scans. We’ll then be exploring more complicated project phases, which will include replacing or integrating with our museum’s CMS, integrating archival objects and EAD finding aids that currently live in ArchivesSpace, and ingesting complex objects with unique issues, like our oral histories.  We’re also really interested in exploring Spotlight as an exhibition tool and in the possibility of future integration with Archivematica (or something similar) to develop preservation functionality.

We wanted to thank Data Curation Experts and Temple University for talking with us during our decision-making phase! We’re very excited to get involved in the Hydra community!

With thanks,

Michelle DiMeo

Curator of Digital Collections

Chemical Heritage Foundation

Henry Kissinger Project – Ingest Statistics

This is just a brief update to offer some ingest statistics related to the Henry Kissinger project. The digitized project will contain approximately 1,700,000 digital objects from approximately 12,800 folders.

The process of ingest includes both manual and automated processes. The Digital Library Programming group is responsible for the automated steps which basically include the creation of a Ladybird object and then publishing that object to Hydra. At this time, all objects are being ingested in a manner that prevents them from being exposed in the public Hydra interface (FindIT.library.yale.edu). The plan is to “turn on” the collection all at once, which is a better approach when a collection is very large and very complex. Otherwise, researchers may have a difficult time using the collection if materials were made available a little at a time, in sometimes what would seem like a random order.

As of Feb 16:

  • 339,041 – the number of objects ingested into Hydra
  • 4,266 – the number of folders ingested out of the approximate 12,800
  • 7 – the number of digital files that makeup an object in Hydra
  • 2,377,553 – the actual number of files ingested into Hydra
  • 792,655 – total objects ingested into Hydra
  • 5,548,585 – total number files currently in Hydra
  • 10.856 seconds – the average time it takes an object to ingest into Hydra

Something to consider with the last statistic, which is actually the one we focus on the most. At the current rate, time to ingest the entire collection is approximately 213 days. For each 1/10th of a second that this rate fluctuates, the completion time increases/decreases by roughly 31 hours. If ingest was to suddenly start taking 11.8 seconds, it would push the approximate completion time to 232 days.

Avalon Meeting Notes for Nov 24

We had a very productive meeting with three guests: Jon Dunn, Project Director, Indiana University; Mark Notess, Product Owner, Indiana University; Julie Rudder, Product Owner, Northwestern University.

The day started with an introduction to Avalon Media Systems in the Library Lecture Hall which included demonstrations of the work of Indiana and Northwestern who very recently released their first Avalon collections to the public. Powerpoint from the presentation is attached here: Yale Avalon Conference This lecture was video taped but sound is lacking, please contact michael.friscia@yale.edu for access.

Later in the day a smaller group convened to have a technical discussion about the future roadmap of Avalon. A recent poster that gives a very high level view can be seen here: RudderAvalon2. I am hoping to acquire a copy of the powerpoint presentation which has bullet lists of all the planned work that will go into version 3.2 through 4.0.

While we are still in the discussion stages of a project to bring Avalon up at Yale, two of the most important features for us include integration with Fedora 4 and possible integration of the backend transcoding processes into Sufia (Penn State has a version of Sufia called ScholarSphere). Our goal would be to integrate the two Hydra applications together so that audio and video files loaded into the self archiving product, Sufia, would take advantage of all the features of the Avalon Media System.

In addition, we discussed many topics including scaling Avalon so that it could transcode more than one file at a time, use RDF for describing complex relationships between multiple files/tracks and digital preservation.

 

 

Tufts University Becomes 25th Hydra Partner

Tufts University officially became the 25th Hydra partner on November 18, 2014. The full list of partners and Hydra adopters wishing to be publicly recognized may be viewed here: Hydra Partners.

In addition, I wanted to provide a link to the November Partner Phone call notes. Some highlights below:

  • The Hydra Steering group elected two new members: Jon Dunn, Indianna University and Michael Giarlo, Penn State University.
  • The Steering group will undertake a process to draft a set of bylaws for governance of the project given its rapid growth.
  • The Steering group is also beginning to look at contract services for banking and other legal activities such as trademark and intellectual property.
  • Significant expansion of Hydra adoption in Europe was reported as well as a successful Hydra UK event at the University of Hull.
  • Hydra had excellent presence and reception at DLF in Atlanta this past October.
  • DuraSpace conducted a Fedora 4 webinar which is now available online.
  • A proposal for formalizing the creation of Hydra special interest groups is near ratification.
  • The Hydra Archivists Working Group (HAWG) has reformed under the new leadership of Ben Goldman, Penn State University.
  • Stanford announced ArcLight as a formal project that is kicking off and seeking partnerships for planning and development. In short, ArcLight is an effort to build a Blacklight-based environment to support discovery and digital delivery of information in Archives. Which also includes integration with ArchivesSpace and EAD support.
  • Hydra expansion into South America is underway and several universities (wishing to remain anonymous at this time) are beginning Hydra development. Another major milestone is the work they are taking on to translate all of the Hydra documentation written in English into Spanish.

Fedora 4 development notes for November 2014

Fedora is used locally in a number of applications including our Hydra instance, Finding Aids Database, AMEEL and the Joel Sumner Smith collection. In our local instances we have been using versions of Fedora 3 since 2006. Fedora 3.8, to be released in December, will be the final release of the 3.x line with energy at that pointed shifted primarily to Fedora 4.

Fedora 4 consists of a complete refactoring of the fedora 3 code now built on top of the Modeshape and JCR repository APIs, with improvements in ease of installation, scaling, and RDF support. Below is a full list of features.

Yale contributes financially to the product as a bronze member and we also contribute to the programming efforts. Osman Din and Eric James, both in Digital Library Programming Services, actively participate in development. In addition, I sit on the Fedora Leadership Committee that handles the gathering of use cases, prioritization of features being programmed as well as budget planning.

Fedora 4.0 is now undergoing a cycle of beta releases to allow institutions to begin adopting it. On November 9th Fedora 4.0 Beta 4 was released again with an eye towards simple installation and support for performance and repository size. Fedora 4.1, to begin development in 2015, will focus on supporting the upgrade/migration process from Fedora 3.x. Some peers, including Penn State, have already begun to replace some of their Fedora 3 repositories with Fedora 4. We are also starting to think about how our migration strategies can dovetail with fedora 4.1 for our own adoption starting around the of summer of 2015.

With that little bit of background, I thought I would share the recent development notes. If you have questions about Fedora, do not hesitate to contact me (michael.friscia@yale.edu), Eric James (eric.james@yale.edu) or Osman Din (osman.din@yale.edu).

(note: Eric provided valuable editorial feedback for the above post)

======================================================

Release date: 9 November, 2014

 

We are proud to announce the fourth Beta release of Fedora 4. In the continuing effort to complete the Fedora 4 feature set, this Beta release is one of several leading up to the Fedora 4 release. Full release notes and downloads are available on the wiki: https://wiki.duraspace.org/display/FF/Fedora+4.0+Beta+4+Release+Notes.

 

==============

Release Manager

==============

Andrew Woods (DuraSpace)

 

==========

Contributors

==========

—————————-

1) Sprint Developers

 

Adam Soroka (University of Virginia)

Benjamin Armintor (Columbia University)

Chris Beer (Stanford University)

Esme Cowles (University of California, San Diego)

Giulia Hill (University of California, Berkeley)

Jared Whiklo (University of Manitoba)

Jon Roby (University of Manitoba)

Kevin S. Clarke (University of California, Los Angeles)

Longshou Situ (University of California, San Diego)

Michael Durbin (University of Virginia)

Mohamed Mohideen Abdul Rasheed (University of Maryland)

Osman Din (Yale University)

 

————————————

2) Community Developers

 

Aaron Coburn (Amherst College)

Frank Asseg (FIZ Karlsruhe)

Nikhil Trivedi (Art Institute of Chicago)

 

=======

Features

=======

—————————–

1) Removed features

In the interest of producing a stable, well-tested release, the development team identified and removed a number of under-developed features that had not been sufficiently tested and documented. These features were not identified as high priorities by the community, but they may be re-introduced in later versions of Fedora 4 based on community feedback.

 

– Namespace [1] creation/deletion endpoint

– Locks endpoint

– Workspaces other than the ‘default’

– Admin internal search endpoints

– Policy-driven storage

– Batch operations in single request

– Auto-versioning configuration option

– Sitemaps endpoint

– Writable nodetypes endpoint

 

——————

2) REST API

The REST API is one of the core Fedora 4 components, and this release brings it more in line with the emerging W3C Linked Data Platform 1.0 [2] specification. An example of this is the new tombstone functionality [3]; URIs are not supposed to be reused, so deleting a resource leaves a tombstone in its place that serves as a notification that the resource has been deleted. Child nodes of deleted resources also leave tombstones. Other examples of LDP-related REST API changes include:

 

– Support for hashed URIs [4] as subjects and objects in triples.

– Binary and binary description model changed:

– From: binary description at /resource, and binary at /resource/fcr:content,

– To: binary description at /resource/fcr:metadata, and binary at /resource

– Labels are required when creating new versions of resources [5].

– Content-Disposition, Content-Length, Content-Type are now available on HEAD requests [6].

 

—————-

3) Ontology

The Fedora 4 ontology [7] was previously broken out into several different namespaces, but these have now been collapsed into the repository [8] namespace. Additionally, the oai-pmh [9] namespace has been added to the ontology.

 

———-

4) LDP

Fedora 4 provides native linked data functionality, primarily by conforming with the W3C Linked Data Platform 1.0 [10] specification. The LDP 1.0 test suite [11] is executed against the Fedora 4 codebase as a part of the standard build process, and changes are made as necessary to pass the tests. Additionally, integrations tests for real-world RDF assertions [12] have also been added to the codebase.

 

Recent changes to suport LDP include:

 

– When serializing to RDF, child resources are included in responses [13], versus having to traverse auto-generated intermediate nodes.

– All RDF types on properties are now supported [14].

– Prefer/Preference-Applied headers have been updated [15] to match the latest requirements [16].

– RDF language types are now supported [17].

– The full range of LDP containers [18] are now supported

– Changed terminology from:

– object -> container

– datastream -> non-rdf-source-description

– Replaced relationships from:

– hasContent/isContentOf, to:

– describes/isDescribedBy

 

—————————

5) External modules

In additional to the core Fedora 4 codebase, there are a number of supported external modules that offer useful extensions to the repository. Two such modules are being introduced in the Fedora 4.0 Beta 4 release: Fedora 4 OAI Provider [19] and Fcrepo Camel [20].

 

The Fedora 4 OAI Provider implements the Open Archives Protocol Version 2.0 [21] using Fedora 4 as the backend. It exposes an endpoint at  /oai  which accepts OAI conforming HTTP requests. A Fedora resources containing set information can be created then exposed at the module’s endpoint which accepts HTTP POST requests containing serialized Set information adhering to the OAI schema.

 

Fcrepo Camel provides access to an external Fedora 4 Containers API [22] for use with Apache Camel [23]. Camel is middleware for writing message-based integrations, so this component can be used to connect Fedora 4 an extensive number of external systems [24], including Solr and Fuseki. This functionality is similar to that of the Fcrepo Message Consumer [25], except it is based on a well-maintained Apache project rather than being custom Fedora 4 code. Therefore, this component is likely to replace the Message Consumer in the future, though the Message Consumer will still be part of the Fedora 4.0 release.

 

————————

6) Admin Console

The administrative console provides a simple HTML user interface for viewing the contents of the repository and accessing functionality provided by the REST API. This release introduces support for custom velocity templates [26] based on the hierarchy of mixing types. Now, if you create a new mixin type, the templates to be used in the admin console will include the resource’s primary type, mixin types, and parent types thereof.

 

—————–

7) Projection

The projection [27] (also known as federation) feature allows Fedora 4 to connect to external storage media via a pluggable connector framework. A read-only filesystem connector is included with this release.

 

Additionally, Fedora 4 now has standardized support for externally-referenced content [28].

 

—————————

8) Java client library

The Java Client Library [29] is an example of a module that was conceived by Fedora community members who recognized a common need and rallied to design [30] and implement the functionality. This release includes an improvement to list the children of a resource [31] in the client library.

 

———–

9) Build

A key component under the covers of Fedora 4 is ModeShape [32], one that the Fedora 4 project tracks closely. Fedora 4.0 Beta 4 includes an upgrade to the production version of ModeShape 4.0.0 [33].

 

Fedora 4 comes with built-in profiling machinery that keeps track of how many times specific services have been requested, how long each request takes to be serviced, etc. These metrics can be visualized using Graphite [34]. Because Graphite can be difficult to setup and configure [35], this release includes a Packer.io build [36] which completely automates the process of standing up a Graphite server.

 

Additionally, the pluggable role-based [37] and XACML [38] authorization modules have been pre-packaged into fcrepo-webapp-plus [39]. This project builds custom-configured fcrepo4 webapp war files that include extra dependencies and configuration options.

 

————————-

10) Test Coverage

Unit and Integration test coverage [40] is a vital factor in maintaining a healthy code base. The following are the code coverage statistics for this release.

 

– Unit tests: 66.2%

– Integration tests: 69.4%

– Overall coverage: 82.5%

 

=========

References

=========

[1]  https://wiki.duraspace.org/display/FF/Glossary#Glossary-namespaceNamespace

[2]  http://www.w3.org/TR/ldp/

[3]  https://wiki.duraspace.org/display/FF/RESTful+HTTP+API#RESTfulHTTPAPI-RedDELETEDeletearesource

[4]  https://github.com/fcrepo4/fcrepo4/commit/5c30c743bb05ef627acc90f4b037b118c7d9de9c

[5]  https://wiki.duraspace.org/display/FF/Versioning#RESTfulHTTPAPI-Versioning-BluePOSTCreateanewversionofanobject

[6]  https://wiki.duraspace.org/display/FF/RESTful+HTTP+API+-+Containers

[7]  https://github.com/fcrepo4/ontology

[8]  http://fedora.info/definitions/v4/repository

[9]  https://github.com/fcrepo4/ontology/blob/master/oai-pmh.rdf

[10] http://www.w3.org/TR/ldp/

[11] http://w3c.github.io/ldp-testsuite/

[12] https://github.com/fcrepo4/fcrepo4/pull/579

[13] https://github.com/fcrepo4/fcrepo4/pull/542

[14] https://github.com/fcrepo4/fcrepo4/pull/587

[15] https://github.com/fcrepo4/fcrepo4/pull/451

[16] http://tools.ietf.org/html/rfc7240#page-7

[17] https://github.com/fcrepo4/fcrepo4/pull/586

[18] https://github.com/fcrepo4/fcrepo4/pull/594

[19] https://github.com/fcrepo4-labs/fcrepo4-oaiprovider

[20] https://github.com/fcrepo4-labs/fcrepo-camel

[21] http://www.openarchives.org/OAI/openarchivesprotocol.html

[22] https://wiki.duraspace.org/display/FF/RESTful+HTTP+API+-+Containers

[23] https://camel.apache.org

[24] https://camel.apache.org/components.html

[25] https://github.com/fcrepo4/fcrepo-message-consumer

[26] https://velocity.apache.org/engine/releases/velocity-1.5/user-guide.html#velocity_template_language_vtl:_an_introduction

[27] https://wiki.duraspace.org/display/FF/Federation

[28] https://wiki.duraspace.org/display/FF/RESTful+HTTP+API+-+Containers#RESTfulHTTPAPI-Containers-external-content

[29] https://github.com/fcrepo4-labs/fcrepo4-client

[30] https://wiki.duraspace.org/display/FF/Design+-+Java+Client+Library

[31] https://github.com/fcrepo4-labs/fcrepo4-client/pull/12

[32] http://modeshape.jboss.org

[33] http://modeshape.jboss.org/downloads/downloads4-0-0-final.html

[34] http://graphite.wikidot.com

[35] https://wiki.duraspace.org/display/FF/Setup+a+Graphite+instance

[36] https://github.com/fcrepo4-labs/fcrepo4-packer-graphite

[37] https://wiki.duraspace.org/display/FF/Basic+Role-based+Authorization+Delegate

[38] https://wiki.duraspace.org/display/FF/XACML+Authorization+Delegate

[39] https://github.com/fcrepo4-labs/fcrepo-webapp-plus

[40] http://sonar.fcrepo.org/dashboard/index/1

FindIt, QuickSearch Security Design Review Completed

"data.path Ryoji.Ikeda - 4" by r2hox is licensed under CC BY-SA 2.0
data.path Ryoji.Ikeda – 4” by r2hox is licensed under CC BY-SA 2.0

Yale University Library FindIt and QuickSearch services have completed a Security Design Review (SDR) by the Information Security Office of Yale ITS.  These systems use the Hydra repository solution as the underlying technology stack.  The SDR process is used to provide recommendations for building, improving, or reengineering services to meet University policies, industry best practices, laws, and regulation requirements.  Thanks to Bob Rice for evaluating and implementing the recommendations and Tom Castiello and Marcus Aden from the Information Security Office for their insight and participation.

Avalon Media Systems

Some preliminary work has been going on in the Digital Library Programming group to investigate Avalon for use in delivering audio and video content from our Hydra repository. Avalon is an open source software package that was developed by the Hydra partners: Indiana University and Northwestern University. We are considering adopting it for some of our audio/video needs as we consider ingesting audio and video into our Hydra repository.

In the video you will see an instance of Avalon that I have running on a virtual server on my computer. To access, I am using a web browser pointing to the virtual server, the URL only works from my computer. The only customization made to the software was to include a small Yale Library logo in the upper left, otherwise the software is “out of the box” and is bundled with an open source media streaming server. The content in this test instance is delivered as part of the “trial version” of the software so that you can see how it works without investing a lot of time.

The video is just under two minutes and demonstrates browsing to a video and showing playback. I demonstrate some of the basic playback controls including full screen. I then show how the video can be embedded on other web pages for sharing the content. Lastly I demonstrate the most basic type of content restriction where I set the requirement that you must be logged into Avalon in order to view the video. I then reload the web page that I embedded the video onto to demonstrate that I am now required to login. After logging in, the video begins to playback.

The video below is best viewed in full screen, it does not require sound but if you do have speaker or headphones, you will hear the music while the video is playing. (youtube link if the video below does not work)

AvalonSample1

Hydra Project Milestone: Automation

The Digital Library & Programming group is pleased to announce that we’ve hit a major milestone the development of the Hydra digital repository system at YUL. Communication and syncing between repository system components became fully automated at the end of September 2014. This automation applies not just to work on the infrastructure built for the Kissinger papers, but for all Ladybird/Hydra interaction.

Automation like this allows metadata and objects to travel within the Hydra system without intervention, which in turn allows Library IT to focus more intensely on structural and workflow development. As a Project Hydra partner, Yale is now in the position to share this work with the Project Hydra community, and empower those members to scale up their own repository ingest services.

Development Notes for 10/13 – 10/17

Just a brief update on the work of our group for the past week.

We continue our efforts in contributing to the Fedora 4 project. We use Fedora as one of the core products in our Hydra implementation. Currently we have several installations of version 3. Version 4 has been in development for a little over a year with an expected release date of June 2015. While Yale has been a financial contributor to the Fedora Commons project for many years now, we only started contributing code to the project in 2013.

The Quicksearch project is also moving along swiftly. This week the major milestone of handling CAS login was completed. This is used for some features in the Blacklight software like the bookbag and search history. CAS is generally simple to integrate with most software products as long as the link between a NetID and the local user database can be made. In the case of Blacklight, making this link became complicated because of the use of several different code libraries in the specific version of Blacklight being used for Quicksearch, which is different from the version we use for our Hydra interface, FindIt.

Almost all efforts this week were related to ingest operations for the Kissinger project. There was also some vacation time taken so the output this week was limited.

Yesterday we met to discuss the development for full text search for objects ingested into Hydra. The work is broken up into the following steps:
1- alter the SOLR index to accept 2 new fields that will store full text
2- alter the Hydra ingest application to store the content of the TXT files into the new SOLR fields
3- setup the Blacklight controllers for handling if/when each of the FT fields are used in user searches
4- develop the Blacklight user interface to allow the FT search option

At this point we are only focused on the first two steps. 3&4 require us to have data in place. We will be moving steps 1 & 2 to the test environment the week of Oct 20 and then roll these changes into production the week of Oct 27. We will be doing all our FT testing with the Day Missions collection which uses a Hydra ingest package very similar to Kissinger.

This is a repost from another location that has more information on our full text search plan. So I will give a brief overview of what that plan looks like with the use cases used to draft this approach.

There are two types of full text search for objects we ingest into Hydra.

The first is the simplest, OCR text from a scanned image like a page in a book or a manuscript. This type of text is treated as an extension of the metadata making it simple to combine into search results since the text is considered open access.

The second is significantly more complex, it is where the contents of the full text require special permission to search so instead of the text being treated as an extension to the metadata, it is treated the same as we treat files that carry special access conditions. This permission would have been granted ahead of time so at the time you execute your full text search it will include results from the restricted items. This use case is currently specific to the Kissinger papers project but is being programmed to scale out as needed.

So the approach we are taking is kind of simple, we place the open access full text into one SOLR field and then the restricted access full text into a field specifically designed for restricted content. At the point when the search is executed, the open access text is searched and the restricted is filtered so that your search is only applied to the restricted contents which you have been granted access to view.