October 24, 2014

LibraryIT acquires New Relic performance management service

LibraryIT recently purchased a license for the performance management and monitoring service New Relic. We will be using the New Relic APM-Application Performance Management application to monitor and improve performance of the new Hydra/Blacklight complex (aka Findit and Quicksearch beta). This is a SaaS, cloud-based service for monitoring applications and their underlying infrastructure as well as the programs themselves.

New Relic does do some usage monitoring, much in the vein of Google Analytics, but the particulars of installation and setup of this service will allow the Information Architecture Group in LibraryIT and others to specifically target performance issues like page loads and search result returns. New Relic will be a great help in assessing the health and responsiveness of the critical servers, applications and which run the Library’s key services.

October 24, 2014

Hydra Project Milestone: Automation

The Digital Library & Programming group is pleased to announce that we’ve hit a major milestone the development of the Hydra digital repository system at YUL. Communication and syncing between repository system components became fully automated at the end of September 2014. This automation applies not just to work on the infrastructure built for the Kissinger papers, but for all Ladybird/Hydra interaction.

Automation like this allows metadata and objects to travel within the Hydra system without intervention, which in turn allows Library IT to focus more intensely on structural and workflow development. As a Project Hydra partner, Yale is now in the position to share this work with the Project Hydra community, and empower those members to scale up their own repository ingest services.

October 23, 2014

Yale ITS to hold Tech Summit this month, Library IT to present

Join us for the inaugural Yale Technology Summit, a day-long program of conversations with Yale faculty, students, and staff working with innovative and cutting-edge technologies. The event, coordinated by Yale Information Technology Services, is free and open to all members of the Yale community.

Library and Library IT presentations at this event include:

Library Development for Digital Repositories: What is this Hydra Fedora stuff?
In response to a fragmented digital collections environment developed over many years using many systems, the Yale Library has launched a project to unify digital collections within a single open source software framework using Hydra/Fedora. Michael Dula, the Library CTO, will talk about the decision to go open source with Hydra and Fedora as the underlying technologies. Topics will include Yale’s contributions to the open source Hydra community, a demonstration of initial projects, and future development plans and possibilities.
Quicksearch: Universal Search at the University Library
The Library offers several search interfaces: Orbis and MORRIS search the Library and Law Library catalogs, Articles+ for articles, journals and newspapers, and several digitized collection searches. The many search interfaces present a challenge to our patrons, who have to select the correct search depending on the material they need. The Library will combine several of these search interfaces into one unified ‘Quicksearch’, which over time will become a comprehensive search interface for the majority of Library resources. The Quicksearch poster session will highlight progress on the project so far. We will also provide laptops so Summit Participants can try the new search for themselves.
Humanities Data Mining in the Library
In response to increased scholarly demand, Yale University Library is helping humanists make sense of large amounts of digital data. In this presentation, we will highlight recent projects based on Yale-digitized data, data from large commercial vendors, and data from the Library of Congress. We’ll address 1) working with digitized collections that are subject to license & copyright, 2) thinking about both explicit metadata and latent structure in large digital collections, and 3) moving beyond text to consider machine vision and computational image analysis.
Preservation and Access Challenges of Born-Digital Materials
We will provide an introduction to the scope of born-digital materials at Sterling Memorial Library and the Beinecke Rare Book & Manuscript Library, and in particular will discuss the innovative ways staff at the Yale libraries are collaborating with colleagues on different initiatives, including a digital forensics lab devoted to the capture of born-digital materials, an emulation service that can provide online access to vintage computing environments via a web browser, and a vision for digital preservation to ensure that collection materials we capture today will remain usable in the future.

Watch the conversation on #YaleTechSummit2014 on Twitter!

via 2014 Tech Summit | Yale ITS

October 17, 2014

Development Notes for 10/13 – 10/17

Just a brief update on the work of our group for the past week.

We continue our efforts in contributing to the Fedora 4 project. We use Fedora as one of the core products in our Hydra implementation. Currently we have several installations of version 3. Version 4 has been in development for a little over a year with an expected release date of June 2015. While Yale has been a financial contributor to the Fedora Commons project for many years now, we only started contributing code to the project in 2013.

The Quicksearch project is also moving along swiftly. This week the major milestone of handling CAS login was completed. This is used for some features in the Blacklight software like the bookbag and search history. CAS is generally simple to integrate with most software products as long as the link between a NetID and the local user database can be made. In the case of Blacklight, making this link became complicated because of the use of several different code libraries in the specific version of Blacklight being used for Quicksearch, which is different from the version we use for our Hydra interface, FindIt.

Almost all efforts this week were related to ingest operations for the Kissinger project. There was also some vacation time taken so the output this week was limited.

Yesterday we met to discuss the development for full text search for objects ingested into Hydra. The work is broken up into the following steps:
1- alter the SOLR index to accept 2 new fields that will store full text
2- alter the Hydra ingest application to store the content of the TXT files into the new SOLR fields
3- setup the Blacklight controllers for handling if/when each of the FT fields are used in user searches
4- develop the Blacklight user interface to allow the FT search option

At this point we are only focused on the first two steps. 3&4 require us to have data in place. We will be moving steps 1 & 2 to the test environment the week of Oct 20 and then roll these changes into production the week of Oct 27. We will be doing all our FT testing with the Day Missions collection which uses a Hydra ingest package very similar to Kissinger.

This is a repost from another location that has more information on our full text search plan. So I will give a brief overview of what that plan looks like with the use cases used to draft this approach.

There are two types of full text search for objects we ingest into Hydra.

The first is the simplest, OCR text from a scanned image like a page in a book or a manuscript. This type of text is treated as an extension of the metadata making it simple to combine into search results since the text is considered open access.

The second is significantly more complex, it is where the contents of the full text require special permission to search so instead of the text being treated as an extension to the metadata, it is treated the same as we treat files that carry special access conditions. This permission would have been granted ahead of time so at the time you execute your full text search it will include results from the restricted items. This use case is currently specific to the Kissinger papers project but is being programmed to scale out as needed.

So the approach we are taking is kind of simple, we place the open access full text into one SOLR field and then the restricted access full text into a field specifically designed for restricted content. At the point when the search is executed, the open access text is searched and the restricted is filtered so that your search is only applied to the restricted contents which you have been granted access to view.