Stanford Hosts Geo4LibCamp 2017

On the week of January 29 Stanford hosted geo4libcamp where 48 people of similar but varying persuasions convened with the common goal of building repository services for geospatial data. Introductions included naming 3 personal interests and “discovery”, “metadata” were among the most frequently cited. The format was of “unconference” design Monday through Wednesday with additional sessions Thursday and Friday. There were 6 planned presentations, a round of lightning talks, a morning spent at the Rumsey Map Center, and unconference planning that determined the 10 unconference sessions chosen by popular demand. Additional sessions included an introduction and tutorial on Geoblacklight, hydra plugin development, and selling the importance of geodata repository to administrators. For more details of the week see:

Geoblacklight is an open source GIS discovery platform for geospatial holdings built on the blacklight discovery application and solr index. At Yale, the Library Executive Committee has made creating a Geoblacklight instance a high priority and it was assuring to see that the community is moving with concerted effort in that direction. Highlighted throughout the week were the common challenges, from technically standing up the software stack, metadata best practices, sharing and interoperability, to specific issues with scanned maps, indexed maps, and hierarchical data. One key takeaway was the compelling argument to adopt geoconcerns: It leverages the existing hydra/sufia/hyrax model, there was a critical mass of buy-in and support, the data model is robust, and the infrastructure and architecture are well defined. Through contributing to the community effort and custom development at Yale with the Ladybird collection management tool and existing metadata, a geoblacklight/geoconcerns solution holds much promise as a leading application to offer Yale patrons in the geospatial realm.

DevOpts for Rails Training

During the week of January 9-12th Rob Kaufman of notch8 led a class on dev-opts attended by developers and sys-admins from the digital library programming group, library IT and central ITS. The focus was on deployment strategies for rails based applications. The class began with an overview the basic components behind applications – servers (apache and nginx), rails modules such as passenger, standalone rails servers (puma, unicorn, thin, and webrick), database components (mysql, postgres, oracle), digital repository (fedora), and index applications (solr). The overview was also framed in terms of 12 factors – codebase, dependencies, config, backing services, build, processes, binding, concurrency, disposability, dev/prod parity, logs, and admin processes. Architecture of the application components was also key to understanding the parts and their various connections.

Deployment workflow traditionally has been a primarily manual process using ssh and command line for codebase, configuration, and server startup. Here we discussed more effective and efficient strategies in depth, namely capistrano, ansible, and docker. Capistrano is a methodology defining servers and basic commands for application code deployment so they’re encapsulated in basic commands. Ansible takes a role based approach that leverages recipes for the building of server components. Docker is a process that runs on top of the operating system kernel where “containers” consisting of the application and server environment are built, shipped and installed using using a basic domain language. In this context we explored continuous integration whereby these processes are integrated across the various environments (development, test, staging, and production) from which a living and breathing application is deployed using the code repositories and server configuration.

For the remainder of the course the class took a deep dive into creating deployment strategies for the main rails applications in the library: quicksearch and findit. As an exercise we broke up into groups and began working on docker implementations. Here we leveraged existing scripts currently in use in vagrant virtual-machine environments and translated them into docker containers. The challenges were many. First, as docker is layer based and built upon existing docker layers, we learned how to search the docker hub as a starting point for building the container, informed by choice of operating system, language, application, and their respective versions. Then through native package managers such as yum and apt-get, we exercised the process of getting all of the native libraries installed. Then the rails application code, including its dependencies in the form of ruby ‘gems’ were bundled in total creating a working docker image. Finally this image and images for external components (such as fedora,solr, mysql, and postgres), and environmental variables were orchestrated together using the docker-compose tool and stack-car convenience API to create a working application. By the end of the week a basic proof of concept for deploying containers using ansible and docker was generated, a pretty significant achievement! It is anticipated that the technology put in place through this week of training will be refined and expanded and what was once a laborious process will be optimized to the benefit all parties involved, developers, sys-admins, and most importantly the end user.

Hydra Connect 2016

Hydra Project
This week, from October 3rd to October 6th, Boston Public Library hosted the Hydra Connect 2016 conference. Project Hydra is a repository solution managing components involved in storing and providing access to digital content. Project Hydra can be described in broad terms as the confluence of community and collaboration made manifest in the development of open source software, and the conference brought together close to 200 people from institutions across the globe to connect. Seven people attended from Yale Library, Mike Friscia, Anju Meenattoor, Lakeisha Robinson, George Ouellette, Youn Noh, Osman Din, and Eric James.

The conference was organized as workshops on Monday, a plenary session Tuesday morning, a poster session Tuesday afternoon, multi-tracked presentations/panels/lightning talks Wednesday, and breakout sessions Thursday. Topics were varied but commonly themed. There was discussion of service management and project management taking into consideration issues such as adoption, migration, and upgrade paths. There was a focus on the learning, sharing and best practices of the technology itself – the software stack, infrastructure, deployment, and monitoring. Much of the presentation centered around the community efforts driving base applications such as the Sufia institutional repository, the Avalon AV system, and the Fedora repository. Content specific challenges were addressed from both an an abstract modeling perspective to the unique considerations of GIS assets, newspapers, images, AV materials, and research data, through frameworks such as the PCDM/hydra works and the IIIF specifications.

The enthusiasm was palpable and the project hydra motto “if you want to go fast, go alone, if you want to go far, go together” was evident, but in many ways there what prevailed was a constant tension between customization and consolidation – the need for diverse institutions to implement a variety of special features while simultaneously developing towards an easily maintainable common core. In any case the takeaways from the conference will influence the direction of services provided by the Yale Library longterm, from the digital collections interface FindIT, the Yale instance of the AV Avalon Platform, to the unified search interface Quicksearch.



Spotlight on Spotlight


Do you have content in blacklight? Do you have content in other silos? Would you like to create dynamic exhibits and/or collections?  Would you like to manage content, display, search, and facets in a highly configurable online interface?  If you answered yes to any of this, welcome to Spotlight!

“Spotlight is open source software that enables librarians, curators, and other content experts to easily build feature-rich websites that showcase collections and objects from a digital repository, uploaded items, or a combination of the two. Spotlight is a plug-in for Blacklight, an open source, Ruby on Rails Engine that provides a basic discovery interface for searching an Apache Solr index.”

Exhibit page content can be directly tweaked from the browser.
Exhibit page content can be directly tweaked from the browser.

On August 9th and 10th the Yale Center for British Art (YCBA) and Yale Library hosted the event “Spotlight on Spotlight”.  We were pleased to have members of the Spotlight team here to give a full demonstration, Q&A, and developer unconference.  Stu Snydman, Gary Geisler, and Chris Beer from Stanford, and Trey Pendragon from Princeton lead the sessions. The main demonstration Tuesday morning included a brief history, a review of the initial use cases, context surrounding the platform, and walk throughs of the application and its features. In the afternoon the Q&A session provided a further chance to answer questions collected from the morning presentation and a live conversation. On Wednesday developers stood up individual instances of the application, exercised its extensibility using the DPLA API to import content, and held further technical discussion. After attending the event Steve Weida, Yale Library Webmaster commented, “Spotlight is exciting technology and has matured at a very impressive pace. Along with our commitment to Omeka, Spotlight could play a key role in the future of the Library’s web presence.”

A full recording of the demonstration is available here:

Project website with codebase and further links:

Event wiki:


Geo4LibCamp 2016

Hats off to Stanford for hosting Geo4LibCamp2016. This event brought together approximately 40 attendees from institutions across the United States including Miriam Olivares and Eric James from Yale. The focus of activities centered around the web application geoblacklight, the opengeoportal project, Esri software, open source geo-tools and the challenges of using these systems as a GIS platform. Key topics included the 1) development of the data schema as an index and source of linked data, 2) data and metadata workflow, 3) issues of provenance, authorship, enrichment, sharing, and rights, and 4) digital infrastructure. The community is enthusiastic and development is expected to continue at and between the represented institutions.

erj_geo4lib - New Page-2


For more information:

A Metadata Schema for Geospatial Resource Discovery Use Cases

HydraConnect 2015

Hydra Project
The HydraConnect 2015 conference took place September 21-25 in Minneapolis[1] totaling 200 people from 60 institutions including 2 representatives from Yale, Kalee Sprague and Eric James. The conference was structured with Monday Workshops, a Tuesday morning plenary, a Tuesday afternoon poster session, and sessions, lighting talks, and breakout groups on Thursday and Friday. The project “Hydra” has come to represent an aggregation of components serving the needs of the digital community. Core applications include Blacklight[2] – a discovery index and interface, Sufia[3] – an institutional repository supporting self-upload, Avalon[4] – an application for audio/video materials, Spotlight[5] – an exhibit creation tool, Stanford Earthworks[6] – supporting spatial discovery, and Hydra in a Box[7] – a new project to create a turnkey Hydra application. The main themes of the conference were linked data and interoperability, the approach of defining a content model by fleshing out metadata concerns driven by end user requirements. To this end several initiatives are currently under development centered around The Portland Common Data Model PCDM[8], a construct built on the resource/description/containment spec of the Linked Data Platform[9], providing a generic framework for resource properties and association. A key component of this is the championing of the approach of using dereferenceable URIs[10] in metadata description and tackling the challenges this entails such as enriching current literal description, resolving URIs to its constituent properties, caching fragments of this linked data, and achieving all of this in a platform agnostic way. Complementing this work are several interest and working groups, addressing the specific areas such descriptive/rights/structural metadata, service management, UX design, and archival interests. HydraConnect 2015 is the third conference of its kind and has grown considerably each year with expectations of much development to continue.


Grouper Representatives Meet with ITS


Representatives for Grouper met with ITS July 15th to 17th to introduce their product. From the website[1]:

“Grouper is an enterprise access management system designed for the highly distributed management environment and heterogeneous information technology environment common to Universities. Operating a central access management system that supports both central and distributed IT reduces risk.”

Three members of library IT, Steelsen Smith, Lakeisha Robinson and Eric James were in attendance for the Wednesday session. The morning started with an overview of the product, in essence a java stack with a programming and web service API. It was demonstrated how it can pull together subject information from external identity management providers, and provide a means for creating and managing groups of users, with attributes, privileges, permissions, and roles that can be made available for applications requiring group level authentication and authorization. Group membership information can then be combined and reused in exclusion and inclusion logic allowing for an extendable set of permissions at the application level. The session delved into the use cases of 3 subsets of university IT – person registries, learning management systems, and library systems. So, for example, a course managed in canvas or sakai could use groups of shoppers, instructors, TA’s, guests, and enrolled students to grant dynamic privileges to course materials. VPN usage campus wide could be administered with fine control and help manage provisioning workflows. Restricted library collections such as the Henry Kissinger Papers could efficiently manage sets of permissions by including patrons in pragmatically defined authorization groups. Common to each of the use cases is the challenge of integrating the different identity providers feeding the grouper application with interoperable and unique subject information. Stay tuned for further developments, including a potential rollout in the December timeframe.


LDCX 2015


Approximately 70 people convened at Lathrop Library on the Stanford University campus to collaborate on the converging goals of the library, archive, and museum community at the 6th annual ldcx 2015 conference. While the schedule was ad-hoc, composed of lighting talks, plenary sessions, topic groups, and informal breakouts, the issues were well rooted in the themes of linked data models, discovery applications, and digital asset management. One of the long standing goals of the community has been bringing together individual and institutional efforts and this was very much manifest at the conference. There was a fruitful balance of sharing past achievement, making ongoing progress and planning for challenges to come. The Hydra stack has made its presence felt in almost every arena. Development is at a stage where best practices and design abstractions are emerging. Implementation of the Linked Data Platform (LDP), and the Portland Commons Data Model (PCDM) holds much promise as foundations of the future. Surprisingly there was very little coverage of Digital Preservation, but perhaps this a potential vacuum to be filled later. While is difficult to give adequate attention to everything covered, for more please check out:

Arclight and next-gen archives
Linked Data Platform
Portland Commons Data Model
IIIF Image and Presentation Specification
Fedora 4

code4lib 2015


450 people from around the world gathered in Portland Oregon last week for the 10th annual code4lib library technology conference.

On Monday, approximately 18 pre-conferences were held in half and and full day sessions mostly comprised of demos, tutorials and discussion groups. I attended a morning session on linked data lead by Tom Johnson of DPLA and Karen Estlund of the University of Oregon. As a developer, the demonstration of the ruby gem ActiveTriples was particularly interesting in its ability to quickly model content into RDF classes and properties that can seamlessly connect to fedora 4 persistence or any extensible back end.

In the afternoon I attended a GeoBlacklight demo lead by Jack Reed and Darren Hardy of Stanford. The Stanford GeoBlacklight is a leading map collection interface that allows for spacial search, presentation, and discovery based on the development of metadata schemas, conversion workflows, and interface presentation components. The workshop focused on using the VirtualBox virtual machine and Vagrant setup environment to bring up an instance of geoblacklight in minutes.

On Tuesday the conference proper started with a keynote by Selena Deckelman. Her talk focused on the importance of leading the coding community based on principles of inclusion of beginners and marginal groups. The presentations on Tuesday expanded on that theme with talks focused on users, teams, developers and experiences in dealing with library technology challenges.

The presentations of Wednesday were more technically focused. Thursday morning a closing keynote was given by Andromeda Yelton who encouraged building systems with tools designed to best satisfy the “wanderlust” behind user’s and patrons’s drive to discovery. In between the 20 minute presentations were 2 hour long lighting talk session comprised of 5 minutes talks by 12 people. I thought the keynotes nicely framed the conference, the lightning talks were a great way to digest and get a pulse on what people were working on. As a developer I was particularly interested the the presentation of tools providing facility, such as Kevin Clarke’s presentation of Packer, a dev-opts tool for deploying to virtual machines, and Stanford’s OEmbed service for offering embeddable links to their digital collections, and a presentation by Stanford’s Rob Sanderson and Naomi Dushay describing the experience attempting to integrate their ILS, digital collections, and discovery indexes.

On Thursday afternoon and Friday, I attended working groups focused on fedora 4, hydra’s support of fedora 4, content modeling, and the linked data platform. The discussions were vigorous, and it was a beneficial mental exercise to spin out the various content model concepts of collection/work/file, the distinction between the “aggregates” and “members” predicate, and how to use the LDP Direct and Indirect Containers to deal with assets, rights, and ordering proxies, although I’m afraid not much was resolved. But DPLA (Digital Public Library of America) appears very interested in furthering these concepts into usable models that may promise to be a great step forward in furthering metadata discovery and interoperability.

All in all worthwhile, keeping an eye on next year’s conference, venue TBD.