Tagged: Solr

Development Notes for 10/13 – 10/17

Just a brief update on the work of our group for the past week.

We continue our efforts in contributing to the Fedora 4 project. We use Fedora as one of the core products in our Hydra implementation. Currently we have several installations of version 3. Version 4 has been in development for a little over a year with an expected release date of June 2015. While Yale has been a financial contributor to the Fedora Commons project for many years now, we only started contributing code to the project in 2013.

The Quicksearch project is also moving along swiftly. This week the major milestone of handling CAS login was completed. This is used for some features in the Blacklight software like the bookbag and search history. CAS is generally simple to integrate with most software products as long as the link between a NetID and the local user database can be made. In the case of Blacklight, making this link became complicated because of the use of several different code libraries in the specific version of Blacklight being used for Quicksearch, which is different from the version we use for our Hydra interface, FindIt.

Almost all efforts this week were related to ingest operations for the Kissinger project. There was also some vacation time taken so the output this week was limited.

Yesterday we met to discuss the development for full text search for objects ingested into Hydra. The work is broken up into the following steps:
1- alter the SOLR index to accept 2 new fields that will store full text
2- alter the Hydra ingest application to store the content of the TXT files into the new SOLR fields
3- setup the Blacklight controllers for handling if/when each of the FT fields are used in user searches
4- develop the Blacklight user interface to allow the FT search option

At this point we are only focused on the first two steps. 3&4 require us to have data in place. We will be moving steps 1 & 2 to the test environment the week of Oct 20 and then roll these changes into production the week of Oct 27. We will be doing all our FT testing with the Day Missions collection which uses a Hydra ingest package very similar to Kissinger.

This is a repost from another location that has more information on our full text search plan. So I will give a brief overview of what that plan looks like with the use cases used to draft this approach.

There are two types of full text search for objects we ingest into Hydra.

The first is the simplest, OCR text from a scanned image like a page in a book or a manuscript. This type of text is treated as an extension of the metadata making it simple to combine into search results since the text is considered open access.

The second is significantly more complex, it is where the contents of the full text require special permission to search so instead of the text being treated as an extension to the metadata, it is treated the same as we treat files that carry special access conditions. This permission would have been granted ahead of time so at the time you execute your full text search it will include results from the restricted items. This use case is currently specific to the Kissinger papers project but is being programmed to scale out as needed.

So the approach we are taking is kind of simple, we place the open access full text into one SOLR field and then the restricted access full text into a field specifically designed for restricted content. At the point when the search is executed, the open access text is searched and the restricted is filtered so that your search is only applied to the restricted contents which you have been granted access to view.