Royal Library Presents Chronos

The Royal Library (National Library of Denmark and Copenhagen University Library) recently released a new Hydra application named Chronos.

Details of the project can be found here: http://www.kb.dk/en/nb/afdelinger/db/index.html

The project spanned a little over two years during which the bulk of the work  was in establishing policies for long term digital preservation and then setting a strategic plan based on the policies.

The work then segued into the cost models to support the newly developed policies and strategic plan. Using a shared set of principle and guidelines from Collaboration to Clarify the Cost of Curation (http://www.4cproject.eu/), they developed a sustainable cost model for long term preservation of their digital assets.

Once they had policies, strategies and costs established, they moved their work into a more detailed level and focused on the metadata requirements for preservation. The focus was on event data to be stored in PREMIS and structure data to be stored in METS. This led to much more detailed discussions related to the discovery of the digital assets for public discovery as well as metadata required for creating internal reports for performing tasks related to digital preservation.

Once this work was complete, they moved into the process of specifications for the system. They selected Hydra as the best approach for digitally preserving millions of documents. The planning process started in June 2014 and continued through the end of October 2014. This past December work begun on the new system and the week of March 16, 2015 they will release videos and additional information to the Hydra community.

Indiana University Receives NEH Grant for Digital Preservation using Hydra

The National Endowment for the Humanities recently awarded the Indiana University Libraries and WGBH Boston a grant to support the development of HydraDAM2. This preservation-oriented digital asset management system for time-based media will improve upon WGBH’s existing HydraDAM system and work seamlessly with the Avalon Media System for user access, among other features.

Both HydraDAM and the Avalon Media System grew from the Hydra community. Hydra is an open source technology framework that supports the creation of preservation and access applications for digital assets based on the Fedora repository system. A community of institutions known as the Hydra Partners works together to maintain the framework and create applications for local or shared use by libraries, archives, and cultural institutions. Both Indiana University and WGBH Boston are among the 25 Hydra Partner institutions. Indiana University is collaborating with Northwestern University on the development of the Avalon Media System and WGBH developed the original HydraDAM system with help from the Data Curation Experts group.

[complete article]

HydraDam is based on the popular Hydra application Sufia. You can view some interesting examples of institutions using Sufia for digital preservation here:

Penn State: ScholarSphere

Notre Dame: CurateND

Case Western: Digital Case

 

Hydra Project

 

ProjectHydra.org
Avalon Media Systems

 

Fedora 4 development notes for November 2014

Fedora is used locally in a number of applications including our Hydra instance, Finding Aids Database, AMEEL and the Joel Sumner Smith collection. In our local instances we have been using versions of Fedora 3 since 2006. Fedora 3.8, to be released in December, will be the final release of the 3.x line with energy at that pointed shifted primarily to Fedora 4.

Fedora 4 consists of a complete refactoring of the fedora 3 code now built on top of the Modeshape and JCR repository APIs, with improvements in ease of installation, scaling, and RDF support. Below is a full list of features.

Yale contributes financially to the product as a bronze member and we also contribute to the programming efforts. Osman Din and Eric James, both in Digital Library Programming Services, actively participate in development. In addition, I sit on the Fedora Leadership Committee that handles the gathering of use cases, prioritization of features being programmed as well as budget planning.

Fedora 4.0 is now undergoing a cycle of beta releases to allow institutions to begin adopting it. On November 9th Fedora 4.0 Beta 4 was released again with an eye towards simple installation and support for performance and repository size. Fedora 4.1, to begin development in 2015, will focus on supporting the upgrade/migration process from Fedora 3.x. Some peers, including Penn State, have already begun to replace some of their Fedora 3 repositories with Fedora 4. We are also starting to think about how our migration strategies can dovetail with fedora 4.1 for our own adoption starting around the of summer of 2015.

With that little bit of background, I thought I would share the recent development notes. If you have questions about Fedora, do not hesitate to contact me (michael.friscia@yale.edu), Eric James (eric.james@yale.edu) or Osman Din (osman.din@yale.edu).

(note: Eric provided valuable editorial feedback for the above post)

======================================================

Release date: 9 November, 2014

 

We are proud to announce the fourth Beta release of Fedora 4. In the continuing effort to complete the Fedora 4 feature set, this Beta release is one of several leading up to the Fedora 4 release. Full release notes and downloads are available on the wiki: https://wiki.duraspace.org/display/FF/Fedora+4.0+Beta+4+Release+Notes.

 

==============

Release Manager

==============

Andrew Woods (DuraSpace)

 

==========

Contributors

==========

—————————-

1) Sprint Developers

 

Adam Soroka (University of Virginia)

Benjamin Armintor (Columbia University)

Chris Beer (Stanford University)

Esme Cowles (University of California, San Diego)

Giulia Hill (University of California, Berkeley)

Jared Whiklo (University of Manitoba)

Jon Roby (University of Manitoba)

Kevin S. Clarke (University of California, Los Angeles)

Longshou Situ (University of California, San Diego)

Michael Durbin (University of Virginia)

Mohamed Mohideen Abdul Rasheed (University of Maryland)

Osman Din (Yale University)

 

————————————

2) Community Developers

 

Aaron Coburn (Amherst College)

Frank Asseg (FIZ Karlsruhe)

Nikhil Trivedi (Art Institute of Chicago)

 

=======

Features

=======

—————————–

1) Removed features

In the interest of producing a stable, well-tested release, the development team identified and removed a number of under-developed features that had not been sufficiently tested and documented. These features were not identified as high priorities by the community, but they may be re-introduced in later versions of Fedora 4 based on community feedback.

 

– Namespace [1] creation/deletion endpoint

– Locks endpoint

– Workspaces other than the ‘default’

– Admin internal search endpoints

– Policy-driven storage

– Batch operations in single request

– Auto-versioning configuration option

– Sitemaps endpoint

– Writable nodetypes endpoint

 

——————

2) REST API

The REST API is one of the core Fedora 4 components, and this release brings it more in line with the emerging W3C Linked Data Platform 1.0 [2] specification. An example of this is the new tombstone functionality [3]; URIs are not supposed to be reused, so deleting a resource leaves a tombstone in its place that serves as a notification that the resource has been deleted. Child nodes of deleted resources also leave tombstones. Other examples of LDP-related REST API changes include:

 

– Support for hashed URIs [4] as subjects and objects in triples.

– Binary and binary description model changed:

– From: binary description at /resource, and binary at /resource/fcr:content,

– To: binary description at /resource/fcr:metadata, and binary at /resource

– Labels are required when creating new versions of resources [5].

– Content-Disposition, Content-Length, Content-Type are now available on HEAD requests [6].

 

—————-

3) Ontology

The Fedora 4 ontology [7] was previously broken out into several different namespaces, but these have now been collapsed into the repository [8] namespace. Additionally, the oai-pmh [9] namespace has been added to the ontology.

 

———-

4) LDP

Fedora 4 provides native linked data functionality, primarily by conforming with the W3C Linked Data Platform 1.0 [10] specification. The LDP 1.0 test suite [11] is executed against the Fedora 4 codebase as a part of the standard build process, and changes are made as necessary to pass the tests. Additionally, integrations tests for real-world RDF assertions [12] have also been added to the codebase.

 

Recent changes to suport LDP include:

 

– When serializing to RDF, child resources are included in responses [13], versus having to traverse auto-generated intermediate nodes.

– All RDF types on properties are now supported [14].

– Prefer/Preference-Applied headers have been updated [15] to match the latest requirements [16].

– RDF language types are now supported [17].

– The full range of LDP containers [18] are now supported

– Changed terminology from:

– object -> container

– datastream -> non-rdf-source-description

– Replaced relationships from:

– hasContent/isContentOf, to:

– describes/isDescribedBy

 

—————————

5) External modules

In additional to the core Fedora 4 codebase, there are a number of supported external modules that offer useful extensions to the repository. Two such modules are being introduced in the Fedora 4.0 Beta 4 release: Fedora 4 OAI Provider [19] and Fcrepo Camel [20].

 

The Fedora 4 OAI Provider implements the Open Archives Protocol Version 2.0 [21] using Fedora 4 as the backend. It exposes an endpoint at  /oai  which accepts OAI conforming HTTP requests. A Fedora resources containing set information can be created then exposed at the module’s endpoint which accepts HTTP POST requests containing serialized Set information adhering to the OAI schema.

 

Fcrepo Camel provides access to an external Fedora 4 Containers API [22] for use with Apache Camel [23]. Camel is middleware for writing message-based integrations, so this component can be used to connect Fedora 4 an extensive number of external systems [24], including Solr and Fuseki. This functionality is similar to that of the Fcrepo Message Consumer [25], except it is based on a well-maintained Apache project rather than being custom Fedora 4 code. Therefore, this component is likely to replace the Message Consumer in the future, though the Message Consumer will still be part of the Fedora 4.0 release.

 

————————

6) Admin Console

The administrative console provides a simple HTML user interface for viewing the contents of the repository and accessing functionality provided by the REST API. This release introduces support for custom velocity templates [26] based on the hierarchy of mixing types. Now, if you create a new mixin type, the templates to be used in the admin console will include the resource’s primary type, mixin types, and parent types thereof.

 

—————–

7) Projection

The projection [27] (also known as federation) feature allows Fedora 4 to connect to external storage media via a pluggable connector framework. A read-only filesystem connector is included with this release.

 

Additionally, Fedora 4 now has standardized support for externally-referenced content [28].

 

—————————

8) Java client library

The Java Client Library [29] is an example of a module that was conceived by Fedora community members who recognized a common need and rallied to design [30] and implement the functionality. This release includes an improvement to list the children of a resource [31] in the client library.

 

———–

9) Build

A key component under the covers of Fedora 4 is ModeShape [32], one that the Fedora 4 project tracks closely. Fedora 4.0 Beta 4 includes an upgrade to the production version of ModeShape 4.0.0 [33].

 

Fedora 4 comes with built-in profiling machinery that keeps track of how many times specific services have been requested, how long each request takes to be serviced, etc. These metrics can be visualized using Graphite [34]. Because Graphite can be difficult to setup and configure [35], this release includes a Packer.io build [36] which completely automates the process of standing up a Graphite server.

 

Additionally, the pluggable role-based [37] and XACML [38] authorization modules have been pre-packaged into fcrepo-webapp-plus [39]. This project builds custom-configured fcrepo4 webapp war files that include extra dependencies and configuration options.

 

————————-

10) Test Coverage

Unit and Integration test coverage [40] is a vital factor in maintaining a healthy code base. The following are the code coverage statistics for this release.

 

– Unit tests: 66.2%

– Integration tests: 69.4%

– Overall coverage: 82.5%

 

=========

References

=========

[1]  https://wiki.duraspace.org/display/FF/Glossary#Glossary-namespaceNamespace

[2]  http://www.w3.org/TR/ldp/

[3]  https://wiki.duraspace.org/display/FF/RESTful+HTTP+API#RESTfulHTTPAPI-RedDELETEDeletearesource

[4]  https://github.com/fcrepo4/fcrepo4/commit/5c30c743bb05ef627acc90f4b037b118c7d9de9c

[5]  https://wiki.duraspace.org/display/FF/Versioning#RESTfulHTTPAPI-Versioning-BluePOSTCreateanewversionofanobject

[6]  https://wiki.duraspace.org/display/FF/RESTful+HTTP+API+-+Containers

[7]  https://github.com/fcrepo4/ontology

[8]  http://fedora.info/definitions/v4/repository

[9]  https://github.com/fcrepo4/ontology/blob/master/oai-pmh.rdf

[10] http://www.w3.org/TR/ldp/

[11] http://w3c.github.io/ldp-testsuite/

[12] https://github.com/fcrepo4/fcrepo4/pull/579

[13] https://github.com/fcrepo4/fcrepo4/pull/542

[14] https://github.com/fcrepo4/fcrepo4/pull/587

[15] https://github.com/fcrepo4/fcrepo4/pull/451

[16] http://tools.ietf.org/html/rfc7240#page-7

[17] https://github.com/fcrepo4/fcrepo4/pull/586

[18] https://github.com/fcrepo4/fcrepo4/pull/594

[19] https://github.com/fcrepo4-labs/fcrepo4-oaiprovider

[20] https://github.com/fcrepo4-labs/fcrepo-camel

[21] http://www.openarchives.org/OAI/openarchivesprotocol.html

[22] https://wiki.duraspace.org/display/FF/RESTful+HTTP+API+-+Containers

[23] https://camel.apache.org

[24] https://camel.apache.org/components.html

[25] https://github.com/fcrepo4/fcrepo-message-consumer

[26] https://velocity.apache.org/engine/releases/velocity-1.5/user-guide.html#velocity_template_language_vtl:_an_introduction

[27] https://wiki.duraspace.org/display/FF/Federation

[28] https://wiki.duraspace.org/display/FF/RESTful+HTTP+API+-+Containers#RESTfulHTTPAPI-Containers-external-content

[29] https://github.com/fcrepo4-labs/fcrepo4-client

[30] https://wiki.duraspace.org/display/FF/Design+-+Java+Client+Library

[31] https://github.com/fcrepo4-labs/fcrepo4-client/pull/12

[32] http://modeshape.jboss.org

[33] http://modeshape.jboss.org/downloads/downloads4-0-0-final.html

[34] http://graphite.wikidot.com

[35] https://wiki.duraspace.org/display/FF/Setup+a+Graphite+instance

[36] https://github.com/fcrepo4-labs/fcrepo4-packer-graphite

[37] https://wiki.duraspace.org/display/FF/Basic+Role-based+Authorization+Delegate

[38] https://wiki.duraspace.org/display/FF/XACML+Authorization+Delegate

[39] https://github.com/fcrepo4-labs/fcrepo-webapp-plus

[40] http://sonar.fcrepo.org/dashboard/index/1

Yale University participating in Digital Preservation Network (DPN) Member Content Pilot

The Digital Preservation Network (DPN) is a federation of more than 50 academic institutional members who are collaboratively developing the means to preserve the complete scholarly record for future generations. DPN has launched a Member Content Pilot program as a step toward establishing an operational, long-term preservation system shared across the academy.

The pilot is testing real-world interactions between DPN members through DPN “nodes” that ingest data from members of the Digital Preservation Network and package it for preservation storage. Three DPN nodes (Chronopolis/Duracloud, The Texas Preservation Node, and the Stanford Digital Repository) will be functioning as First Nodes. All five DPN nodes (the three named above along with APTrust and HathiTrust) will be providing replication services for the pilot data.

The higher education community has created many digital repositories to provide long-term preservation and access. DPN replicates multiple dark copies of these collections in diverse nodes to protect against the risk of catastrophic loss due to technology, organizational or natural disasters.

Participants in the DPN Member Content Pilot include Chronopolis, University of California San Diego, Dartmouth University, the DuraSpace organization, Stanford University, Texas Preservation Node and Yale University.

The pilot provides:

• A functioning preservation network capable of Services sufficient to allow First Nodes to accepting and replicating Member Pilot content and replicate it to Replicating Nodes using the developing DPN network.

• Opportunity for all participating Members and First Nodes to play out a realistic content deposit scenario and to discuss and capture the requirements and questions raised.

• A preliminary report to the DPN membership regarding results.

via Digital Preservation Network (DPN) Launches Member Content Pilot | DuraSpace.