Tagged: Fedora

Fedora Quarterly Report available online

The quarterly report for Fedora for January – March 2015 is now available:

http://duraspace.org/node/2625

The report details much of what happened in the first three months of this year but also covers a number of general topics from January to June.

Highlights include:

  • Fedora Development
  • Fundraising
  • Fedora Committers
  • Upcoming training and events

Penn State’s ScholarSphere now running on Fedora 4

From the Hydra Community:

The PSU team successfully deployed ScholarSphere on Fedora 4 into production this past Saturday April 12th.

It took approximately 20 hours to migrate our code and our data from Fedora 3 (Sufia 5) to Fedora 4 (Sufia 6) on our production servers.

The transition went smoothly and our upgraded system has been running well for the past two days.

Thank you to my team (Adam Wead, Hector Correra, and Michael Tribone), Penn State ITS Services an Solutions, Penn State Libraries, and the Hydra community for all the hard work and funding that made this transition possible!

Fedora 4 development notes for November 2014

Fedora is used locally in a number of applications including our Hydra instance, Finding Aids Database, AMEEL and the Joel Sumner Smith collection. In our local instances we have been using versions of Fedora 3 since 2006. Fedora 3.8, to be released in December, will be the final release of the 3.x line with energy at that pointed shifted primarily to Fedora 4.

Fedora 4 consists of a complete refactoring of the fedora 3 code now built on top of the Modeshape and JCR repository APIs, with improvements in ease of installation, scaling, and RDF support. Below is a full list of features.

Yale contributes financially to the product as a bronze member and we also contribute to the programming efforts. Osman Din and Eric James, both in Digital Library Programming Services, actively participate in development. In addition, I sit on the Fedora Leadership Committee that handles the gathering of use cases, prioritization of features being programmed as well as budget planning.

Fedora 4.0 is now undergoing a cycle of beta releases to allow institutions to begin adopting it. On November 9th Fedora 4.0 Beta 4 was released again with an eye towards simple installation and support for performance and repository size. Fedora 4.1, to begin development in 2015, will focus on supporting the upgrade/migration process from Fedora 3.x. Some peers, including Penn State, have already begun to replace some of their Fedora 3 repositories with Fedora 4. We are also starting to think about how our migration strategies can dovetail with fedora 4.1 for our own adoption starting around the of summer of 2015.

With that little bit of background, I thought I would share the recent development notes. If you have questions about Fedora, do not hesitate to contact me (michael.friscia@yale.edu), Eric James (eric.james@yale.edu) or Osman Din (osman.din@yale.edu).

(note: Eric provided valuable editorial feedback for the above post)

======================================================

Release date: 9 November, 2014

 

We are proud to announce the fourth Beta release of Fedora 4. In the continuing effort to complete the Fedora 4 feature set, this Beta release is one of several leading up to the Fedora 4 release. Full release notes and downloads are available on the wiki: https://wiki.duraspace.org/display/FF/Fedora+4.0+Beta+4+Release+Notes.

 

==============

Release Manager

==============

Andrew Woods (DuraSpace)

 

==========

Contributors

==========

—————————-

1) Sprint Developers

 

Adam Soroka (University of Virginia)

Benjamin Armintor (Columbia University)

Chris Beer (Stanford University)

Esme Cowles (University of California, San Diego)

Giulia Hill (University of California, Berkeley)

Jared Whiklo (University of Manitoba)

Jon Roby (University of Manitoba)

Kevin S. Clarke (University of California, Los Angeles)

Longshou Situ (University of California, San Diego)

Michael Durbin (University of Virginia)

Mohamed Mohideen Abdul Rasheed (University of Maryland)

Osman Din (Yale University)

 

————————————

2) Community Developers

 

Aaron Coburn (Amherst College)

Frank Asseg (FIZ Karlsruhe)

Nikhil Trivedi (Art Institute of Chicago)

 

=======

Features

=======

—————————–

1) Removed features

In the interest of producing a stable, well-tested release, the development team identified and removed a number of under-developed features that had not been sufficiently tested and documented. These features were not identified as high priorities by the community, but they may be re-introduced in later versions of Fedora 4 based on community feedback.

 

– Namespace [1] creation/deletion endpoint

– Locks endpoint

– Workspaces other than the ‘default’

– Admin internal search endpoints

– Policy-driven storage

– Batch operations in single request

– Auto-versioning configuration option

– Sitemaps endpoint

– Writable nodetypes endpoint

 

——————

2) REST API

The REST API is one of the core Fedora 4 components, and this release brings it more in line with the emerging W3C Linked Data Platform 1.0 [2] specification. An example of this is the new tombstone functionality [3]; URIs are not supposed to be reused, so deleting a resource leaves a tombstone in its place that serves as a notification that the resource has been deleted. Child nodes of deleted resources also leave tombstones. Other examples of LDP-related REST API changes include:

 

– Support for hashed URIs [4] as subjects and objects in triples.

– Binary and binary description model changed:

– From: binary description at /resource, and binary at /resource/fcr:content,

– To: binary description at /resource/fcr:metadata, and binary at /resource

– Labels are required when creating new versions of resources [5].

– Content-Disposition, Content-Length, Content-Type are now available on HEAD requests [6].

 

—————-

3) Ontology

The Fedora 4 ontology [7] was previously broken out into several different namespaces, but these have now been collapsed into the repository [8] namespace. Additionally, the oai-pmh [9] namespace has been added to the ontology.

 

———-

4) LDP

Fedora 4 provides native linked data functionality, primarily by conforming with the W3C Linked Data Platform 1.0 [10] specification. The LDP 1.0 test suite [11] is executed against the Fedora 4 codebase as a part of the standard build process, and changes are made as necessary to pass the tests. Additionally, integrations tests for real-world RDF assertions [12] have also been added to the codebase.

 

Recent changes to suport LDP include:

 

– When serializing to RDF, child resources are included in responses [13], versus having to traverse auto-generated intermediate nodes.

– All RDF types on properties are now supported [14].

– Prefer/Preference-Applied headers have been updated [15] to match the latest requirements [16].

– RDF language types are now supported [17].

– The full range of LDP containers [18] are now supported

– Changed terminology from:

– object -> container

– datastream -> non-rdf-source-description

– Replaced relationships from:

– hasContent/isContentOf, to:

– describes/isDescribedBy

 

—————————

5) External modules

In additional to the core Fedora 4 codebase, there are a number of supported external modules that offer useful extensions to the repository. Two such modules are being introduced in the Fedora 4.0 Beta 4 release: Fedora 4 OAI Provider [19] and Fcrepo Camel [20].

 

The Fedora 4 OAI Provider implements the Open Archives Protocol Version 2.0 [21] using Fedora 4 as the backend. It exposes an endpoint at  /oai  which accepts OAI conforming HTTP requests. A Fedora resources containing set information can be created then exposed at the module’s endpoint which accepts HTTP POST requests containing serialized Set information adhering to the OAI schema.

 

Fcrepo Camel provides access to an external Fedora 4 Containers API [22] for use with Apache Camel [23]. Camel is middleware for writing message-based integrations, so this component can be used to connect Fedora 4 an extensive number of external systems [24], including Solr and Fuseki. This functionality is similar to that of the Fcrepo Message Consumer [25], except it is based on a well-maintained Apache project rather than being custom Fedora 4 code. Therefore, this component is likely to replace the Message Consumer in the future, though the Message Consumer will still be part of the Fedora 4.0 release.

 

————————

6) Admin Console

The administrative console provides a simple HTML user interface for viewing the contents of the repository and accessing functionality provided by the REST API. This release introduces support for custom velocity templates [26] based on the hierarchy of mixing types. Now, if you create a new mixin type, the templates to be used in the admin console will include the resource’s primary type, mixin types, and parent types thereof.

 

—————–

7) Projection

The projection [27] (also known as federation) feature allows Fedora 4 to connect to external storage media via a pluggable connector framework. A read-only filesystem connector is included with this release.

 

Additionally, Fedora 4 now has standardized support for externally-referenced content [28].

 

—————————

8) Java client library

The Java Client Library [29] is an example of a module that was conceived by Fedora community members who recognized a common need and rallied to design [30] and implement the functionality. This release includes an improvement to list the children of a resource [31] in the client library.

 

———–

9) Build

A key component under the covers of Fedora 4 is ModeShape [32], one that the Fedora 4 project tracks closely. Fedora 4.0 Beta 4 includes an upgrade to the production version of ModeShape 4.0.0 [33].

 

Fedora 4 comes with built-in profiling machinery that keeps track of how many times specific services have been requested, how long each request takes to be serviced, etc. These metrics can be visualized using Graphite [34]. Because Graphite can be difficult to setup and configure [35], this release includes a Packer.io build [36] which completely automates the process of standing up a Graphite server.

 

Additionally, the pluggable role-based [37] and XACML [38] authorization modules have been pre-packaged into fcrepo-webapp-plus [39]. This project builds custom-configured fcrepo4 webapp war files that include extra dependencies and configuration options.

 

————————-

10) Test Coverage

Unit and Integration test coverage [40] is a vital factor in maintaining a healthy code base. The following are the code coverage statistics for this release.

 

– Unit tests: 66.2%

– Integration tests: 69.4%

– Overall coverage: 82.5%

 

=========

References

=========

[1]  https://wiki.duraspace.org/display/FF/Glossary#Glossary-namespaceNamespace

[2]  http://www.w3.org/TR/ldp/

[3]  https://wiki.duraspace.org/display/FF/RESTful+HTTP+API#RESTfulHTTPAPI-RedDELETEDeletearesource

[4]  https://github.com/fcrepo4/fcrepo4/commit/5c30c743bb05ef627acc90f4b037b118c7d9de9c

[5]  https://wiki.duraspace.org/display/FF/Versioning#RESTfulHTTPAPI-Versioning-BluePOSTCreateanewversionofanobject

[6]  https://wiki.duraspace.org/display/FF/RESTful+HTTP+API+-+Containers

[7]  https://github.com/fcrepo4/ontology

[8]  http://fedora.info/definitions/v4/repository

[9]  https://github.com/fcrepo4/ontology/blob/master/oai-pmh.rdf

[10] http://www.w3.org/TR/ldp/

[11] http://w3c.github.io/ldp-testsuite/

[12] https://github.com/fcrepo4/fcrepo4/pull/579

[13] https://github.com/fcrepo4/fcrepo4/pull/542

[14] https://github.com/fcrepo4/fcrepo4/pull/587

[15] https://github.com/fcrepo4/fcrepo4/pull/451

[16] http://tools.ietf.org/html/rfc7240#page-7

[17] https://github.com/fcrepo4/fcrepo4/pull/586

[18] https://github.com/fcrepo4/fcrepo4/pull/594

[19] https://github.com/fcrepo4-labs/fcrepo4-oaiprovider

[20] https://github.com/fcrepo4-labs/fcrepo-camel

[21] http://www.openarchives.org/OAI/openarchivesprotocol.html

[22] https://wiki.duraspace.org/display/FF/RESTful+HTTP+API+-+Containers

[23] https://camel.apache.org

[24] https://camel.apache.org/components.html

[25] https://github.com/fcrepo4/fcrepo-message-consumer

[26] https://velocity.apache.org/engine/releases/velocity-1.5/user-guide.html#velocity_template_language_vtl:_an_introduction

[27] https://wiki.duraspace.org/display/FF/Federation

[28] https://wiki.duraspace.org/display/FF/RESTful+HTTP+API+-+Containers#RESTfulHTTPAPI-Containers-external-content

[29] https://github.com/fcrepo4-labs/fcrepo4-client

[30] https://wiki.duraspace.org/display/FF/Design+-+Java+Client+Library

[31] https://github.com/fcrepo4-labs/fcrepo4-client/pull/12

[32] http://modeshape.jboss.org

[33] http://modeshape.jboss.org/downloads/downloads4-0-0-final.html

[34] http://graphite.wikidot.com

[35] https://wiki.duraspace.org/display/FF/Setup+a+Graphite+instance

[36] https://github.com/fcrepo4-labs/fcrepo4-packer-graphite

[37] https://wiki.duraspace.org/display/FF/Basic+Role-based+Authorization+Delegate

[38] https://wiki.duraspace.org/display/FF/XACML+Authorization+Delegate

[39] https://github.com/fcrepo4-labs/fcrepo-webapp-plus

[40] http://sonar.fcrepo.org/dashboard/index/1

Development Notes for 10/13 – 10/17

Just a brief update on the work of our group for the past week.

We continue our efforts in contributing to the Fedora 4 project. We use Fedora as one of the core products in our Hydra implementation. Currently we have several installations of version 3. Version 4 has been in development for a little over a year with an expected release date of June 2015. While Yale has been a financial contributor to the Fedora Commons project for many years now, we only started contributing code to the project in 2013.

The Quicksearch project is also moving along swiftly. This week the major milestone of handling CAS login was completed. This is used for some features in the Blacklight software like the bookbag and search history. CAS is generally simple to integrate with most software products as long as the link between a NetID and the local user database can be made. In the case of Blacklight, making this link became complicated because of the use of several different code libraries in the specific version of Blacklight being used for Quicksearch, which is different from the version we use for our Hydra interface, FindIt.

Almost all efforts this week were related to ingest operations for the Kissinger project. There was also some vacation time taken so the output this week was limited.

Yesterday we met to discuss the development for full text search for objects ingested into Hydra. The work is broken up into the following steps:
1- alter the SOLR index to accept 2 new fields that will store full text
2- alter the Hydra ingest application to store the content of the TXT files into the new SOLR fields
3- setup the Blacklight controllers for handling if/when each of the FT fields are used in user searches
4- develop the Blacklight user interface to allow the FT search option

At this point we are only focused on the first two steps. 3&4 require us to have data in place. We will be moving steps 1 & 2 to the test environment the week of Oct 20 and then roll these changes into production the week of Oct 27. We will be doing all our FT testing with the Day Missions collection which uses a Hydra ingest package very similar to Kissinger.

This is a repost from another location that has more information on our full text search plan. So I will give a brief overview of what that plan looks like with the use cases used to draft this approach.

There are two types of full text search for objects we ingest into Hydra.

The first is the simplest, OCR text from a scanned image like a page in a book or a manuscript. This type of text is treated as an extension of the metadata making it simple to combine into search results since the text is considered open access.

The second is significantly more complex, it is where the contents of the full text require special permission to search so instead of the text being treated as an extension to the metadata, it is treated the same as we treat files that carry special access conditions. This permission would have been granted ahead of time so at the time you execute your full text search it will include results from the restricted items. This use case is currently specific to the Kissinger papers project but is being programmed to scale out as needed.

So the approach we are taking is kind of simple, we place the open access full text into one SOLR field and then the restricted access full text into a field specifically designed for restricted content. At the point when the search is executed, the open access text is searched and the restricted is filtered so that your search is only applied to the restricted contents which you have been granted access to view.