Course e-Reserves and Copyright – Creating Content and Charting Fair Use

In a packed forum last Tuesday, TwTT set out to explore not only the user-friendly process of creating e-reserves, but also the uncharted waters of relevant copyright law. With expert panel members Brad Warren, head of Sterling and Bass Library access services; Craig Kirkland, Bass reserves manager; and Christina Mulligan from the Yale Law School’s information society project, the advantages and limits of e-reserves were mapped out.

Brad opened the discussion with an introduction to the concept of e-reserves. Based on a premise similar to a print reserve, an e-reserve allows instructors to select portions of books and articles to be scanned and hosted for student access. Access is limited to those enrolled in the relevant course and also expires at the end of the semester. The convenience of the system has made it very popular among students and faculty, and e-reserves have been adopted not only by the Yale College and Graduate School, but also by all of the professional schools.

Craig explained the front end of the Bass e-reserve system, where a professor simply contacts reserve staff requesting that a course texts be made available to students and submits a syllabus, starting the process. The library will then locate the texts, scan them if necessary, and create PDF files which are uploaded to a secure server. Staff then embeds hyperlinks to the securely stored material into the syllabus and returns it to the professor to be posted on the Classes*V2 course management system. Students enrolled in the course can click on the links in the syllabus to be taken to the server where a netID login is used to validate identity prior to downloading the material. In the current system text recognition, or OCR, is not automatically performed on scans, meaning that they are neither searchable nor machine readable, but if a professor requests this functionality it can be easily added. At the moment there is no system available to convert scanned work to formats accessible via electronic readers like the kindle or nook, but the possibility is being explored for future iterations of the e-reserve system.

In fact, if there is one constant in the Yale e-reserve service it is change. Brad mentions that after its start in the Yale Medical School five years ago there have been three major changes in the service as well as expansion to the rest of Yale’s schools and the college. The copyright law on which the service rests is also constantly shifting. In the latest development, a group of publishers is suing four administrators of Georgia State University to change the way that campus implements its e-reserve policy. Fair use is a doctrine that exempts parties from acquiring permission from (and paying royalties to) copyright holders in certain limited contexts, including “research,” “scholarship,” and “teaching and multiple copies for classroom use.”  The publishers argue that Georgia State’s system of production and distribution of e-reserve materials exceeds the fair use exemption while the university argues that the program is protected. Since Georgia State is a public university protected by state sovereign immunity there are no financial incentives for the suit, rather it seems that the publishers are seeking to establish a favorable precedent – something conspicuously absent in present jurisprudence.

With the dearth of litigation in the world of e-reserves and electronic course media, and there is a significant amount of misinformation and confusion that surrounds the myriad unusual situations that can emerge. When there is a case being made for more conservative interpretation of fair use guidelines it is frequently legislative history that is invoked. While this may suggest how a law could be interpreted, Christina reminded us that this history is not law itself, and courts have often completely ignored this material when making decisions. The Digital Millennium Copyright Act, or DMCA, is also frequently mentioned, but does not actually apply to most situations since there is no copy protection system (or DRM) built into most of the electronic text being distributed. Instead what applies is a tricky system of balancing publishers’ rights with the academic mission of educators and universities, and the ultimate test will often be how rigorously fair use can be demonstrated.

An example of a situation where fair use is ambiguous is in the re-posting of documents made available elsewhere on the internet. While liability can be avoided completely by linking directly to the third party, in practice an educator may wish not to risk material availability to the vagaries of the internet and to make a copy locally available. Although there is an implicit safety associated with using a document that is already freely available, the truth is that the liability claims have not been vetted in court yet, and the poster would be taking a slight risk, although a negative outcome is very unlikely. If the educator were to comment or modify the material in some way as to make its use for teaching more clear, the risk becomes negligible. Annotating cannot be considered a fail safe way to demonstrate fair use, however, particularly if the copyright holder is in the process of releasing their own annotated edition. Nonetheless, the overall theme is that the closer a work is to being replicated for “scholarship,” the safer it is to release it to a closed group.

Without clear legal guidelines it is difficult to determine who is and who is not liable to being sued for copyright infringement – in fact, there are so many ambiguities that almost anyone who posts copyrighted material electronically is liable to some extent. This should not, however, discourage academics from using e-reserve or posting material for course use. Most potential violations are so minor that publishers would be unwilling to take the risk of annoying a judge and establishing a more generous fair use precedent than they would like. Similarly, current interpretations hinge on every transfer of an electronic file being considered a “copy” – even streaming editions which reside in RAM for a short period of time. A revision of this practice with either changes in technology or interpretation could have major effects on copyright law compliance in academia.

One of the most important, and most frequently overlooked, solutions to the e-reserve copyright quagmire is for scholars to be more proactive about retaining the copyright for materials they produce – something often signed away unwittingly during the publication process. This would allow academics to keep more control over their work, and would make permission seeking easier, reducing the need to seek fair use exemptions.

Until the copyright question is resolved more completely there is a minor risk associated with the electronic distribution of copyrighted materials – small enough for Yale’s general counsel to approve the e-reserves process. There is also work among university libraries to establish general guidelines for e-reserves which are projected for release in 2012. This is in addition to the other guidelines available in the public sphere, including the ALA’s found here. Ultimately, the law is written to ensure that materials are used, and Christina summarizes the current state well in her closing statement: “Hard cases are when something is easily available to licence […] and you’re choosing not to, but the biggest thing I would urge is that when there’s something you don’t know how to clear, and you can’t figure out how to get it, or there’s some way you want to present it that you can’t get clearance for, that’s the […] real fair use stuff that you shouldn’t hesitate [about], because if it inhibits your ability to teach, or students’ ability to learn, then those are the best facts for fair use cases. The one kind of take-away is don’t let the law stop you from doing something that is important to the teaching experience because you’re very unlikely to get sued, especially when you’re in this teaching context.”

Going the Distance: Planning and Implementing a Synchronous Distance Language Course

Last term at Cornell University a group of five second semester Dutch students attended a language class five times a week that included games, group work, writing on a whiteboard, and conversing one on one with their professor. The hitch? Their professor, Chrissy Hosea, was in a classroom in New Haven, and a Yale student was participating in in all group activities.  This experiment in synchronous distance learning was run through Cornell’s Language Resource Center (LRC) and Yale’s Center for Language Study (CLS), and Chrissy Hosea, lector in Dutch, and John Graves, academic technologist, came to TwTT last Tuesday to explain how the project worked and some of the challenges and surprises of setting up a synchronous distance learning program.

Distance learning is growing in popularity not only because of its convenience, but also because it makes financial sense for universities to pool resources in the instruction of less popular subjects. When Cornell found that it could no longer finance a language program in Dutch it turned to distance learning to avoid abandoning students who had started the program already.  The challenge, however, was not to compromise the learning experience.  Students at Yale and Cornell would both be receiving institutional credit for the course, and in order to serve as a replacement for regular classroom instruction, the distance program had to match or exceed the regular classroom experience.  Despite challenges in planning and pedagogy the course earned positive reviews from students on both campuses.

The planning and logistical challenges of distance learning are manifold, and range from problems with internet connectivity to schedule conflicts.  Since the course must be approved by both universities there are twice as many academic committees to be dealt with as compared to a normal class, and seemingly innocuous schedule differences in exam periods and vacations can disrupt instruction unless they are anticipated.  The instructor must also make an effort to keep up with news on both campuses – not being aware of a blizzard or riot can create a disconnect between student and professor.

John explains that in terms of technology and space usage the goal is to make the distance part of distance learning as transparent as possible.  The main software solution, Adobe Connect, was chosen because of its many features (including virtual break-out groups, live screen sharing and collaboration, and reusable workspace templates), and a Tandberg videoconferencing solution was implemented to capitalize on the strong Tandberg infrastructure already deployed at Yale.  The Sympodium electronic tablet is used by the instructor to annotate directly on visuals, and students can share in the workspace using Wacom tablets.  Since Cornell had been in an experimental distance program with Syracuse University their room was already set up permanently and included two remote control cameras, an omnidirectional permanent microphone, and a 50″ plasma display.  One of the advantages  of this layout is that the camerawork is automatic – the microphone detects where voices are coming from and the cameras adjust to that location, eliminating the need for a camera handler.  Yale is in the process of setting up an eight student workspace that balances the need for camera coverage with the impression of being in the same room, but for now uses a completely mobile setup that requires marginally more intervention.

Although technology issues did crop up periodically, usually in the form of frozen video due to an overtaxed network, the system was reliable enough to allow Chrissy to concentrate on teaching, and to figure out how to overcome some of the limitations inherent in distance learning.  She points out that one of her primary concerns was having good enough video quality to “read” the students – explaining that an instructor can tell at a glance when students are getting lost, and being able to do this over video was important.  Cooperative games and group work were well implemented over Adobe Connect – where ideas were generated at 2 am, to play pictionary, for example, and implemented in time for the morning class.  An unexpected advantage of having microphones and cameras was that students always needed to speak loudly and clearly to be heard over the network, forcing them to expose problems with pronunciation.  In fact, students became confident enough speaking over the internet that it was possible to have them give an oral presentation in a Cornell museum (using an iPad as the camera) while instructor watched.

By the end of the term all of the students agreed that a synchronous distance program is a viable way to learn a language, and a few commented on how unexpectedly similar it is to a normal language class.  While Chrissy suggests that it may not be a good approach for more than 12 total students, the compromises were very limited when compared to a normal class, and a mandatory 10 minute weekly office hour session over Skype helped her get closer to the students – something that would be crucial if distance learning were to be used for a first year course.  Although presently the permanent distance learning space at Yale’s CLC is still under construction, the success of the Dutch experiment suggests that distance learning is on track to not only stay, but grow.

For full coverage of this session, please click the video below
(note a slight delay upon initial playback):

video platform video management video solutions video player

Charting Course Capture: One Lecture at a Time

Listening only to a lecture’s audio denies the audience the visual cues and presentation elements that make classroom learning unique. Even when video is used it is a challenge to monitor both presenter and presentation content at the same time.  In order to maintain the integrity of a lecture all of its parts must be captured and replayed, including audio, video, screen content, and examples. To accomplish this a technological solution must be deployed, and in academia this solution is known as course capture, explored this week at TwTT by Academic Technologists Jeffrey Carlson, Matthew Regan, and Paul Perry.

Course capture has become an increasingly important academic technology, and its manifold applications include distance learning, a tool for study and review, and a means of archiving classroom content. Instructors report that students who have seen a presentation a second time frequently ask better questions and show improved understanding. Course capture also eliminates the need to spend time reviewing concepts for students who missed a session.

The advantages of course capture have made it a sought-after technology in higher education, with a recent study at Northwestern University finding that 79% of 150 universities surveyed use some type of course capture system. Yale is no exception, and both Yale College and the professional schools have experimented with various capture solutions, from the video-only Open Yale Courses to projects based on Adobe Connect to the Mediasite and Podcast Producer packages discussed here. Despite initial faculty resistance, both the current Mediasite pilot program and the established Podcast Producer system have yielded very favorable responses from both students and faculty.

Before discussing how systems were received by students and faculty, the technology itself must be described. In general there are two approaches to course capture – hardware and software. Software solutions can have lower initial costs and have the advantage of being usable from any location – a course could theoretically be recorded from the professor’s home. Despite this, the need to offer support services for every presenting computer, as well as the need to train end users, can increase the difficulty of deploying software based course capture. Hardware solutions, while requiring a user to be in the presence of either a fixed or portable recording appliance, can be almost transparent and require almost no user training – scheduling within the capture system can be used to automatically start and stop a recording, making the presenter’s sole responsibility to remember to clip on the lapel microphone. While each approach has advantages, and both have been explored and tested by Yale Academic Technologies, the more favored method tends to the be simpler, automated system.

Screenshot of a list of captures in a Classesv2 course site.

On central campus, Academic Technologies opted for a off-site server solution as opposed to a homegrown one. The product would also ideally be transparent to end users and require minimal support for the instructors. These criteria led to the choice of Mediasite, a hardware based system that includes hosting and playback services that can be integrated with the Classes*v2 learning management system through a secure link. A different company, 3Play Media, provides closed captioning services, not only making the captured lectures accessible to hearing impaired students, but also making them searchable. The output is a dual window featuring the video and audio feed of the presenter in one, and the contents of the projected screen in the other.

The course capture system adopted at Yale School of Medicine was grown over a longer period of time and relies on software and onsite hosting. Podcast Capture is the recording package, which is pre-installed on Apple, classroom computers running Mac OS X 10.6 Snow Leopard or newer, and works in conjunction with the Podcast Producer software included with Mac OS X Server 10.5 or 10.6. Scripts were created on the Podcast Producer server to automate the production process and include standardized templates, overlays and credits for consistency. Since a large majority of students use Apple hardware, this software solution was particularly well suited to needs of the YSM.

Conveniently, technical support staff for the recorded sessions are the same students enrolled in the courses. The students are trained at the beginning of the academic year to start and monitor the captures during class and keep the system running. If an issue should occur where a student needs additional support, Paul has the option of connecting remotely to any of the classroom computers for troubleshooting.

Captures are made available immediately once the rendering process is complete on the server. Courses from the present and last year are available to students directly online, and previous years are archived making for a remarkably efficient system. Over 600 lectures are stored using only 1.5 TB of space, and only 5 trouble incidents were reported in the past year. What’s more, all recordings are recoverable using recovery techniques and QuickTime as a backup.

Both systems have been welcomed, particularly by students. Paul mentions that whenever there is a problem with a lecture upload he is contacted immediately, demonstrating that students are actively using the resource. Among undergraduates the Mediasite based system is also heavily used. For Prof. John Faragher who recorded his American West course lectures last spring and only made the captures available to his 76 students a week before the final exam, metrics show 162 views totaling 75 hours. In another case, student presentations were recorded, making feedback more meaningful and helping students to judge their own stage presence.

Highly transparent capture systems seem to be forgotten by many faculty after a brief acclimation period. Professors who choose to use the system more actively can employ granular controls on what they want shared and when. In some cases, professors will choose to release materials only before an exam in order to discourage absenteeism while preserving the value of the captured lecture as a study tool. Mediasite also allows the production of an HTML document that contains an archive of captured courses which may be useful in preparing future presentations.

Course capture has clear benefits, and will certainly continue to be used at Yale. What is less certain is how the technology will be implemented in the future. Open source solutions like Opencast Matterhorn promise to add collaborative elements to a platform approach to course capture. Future implementations will probably also include a mix of hardware and software solutions in order to employ course capture in more diverse class offerings. Remote management is a feature of the two systems discussed here, and will be important in future implementations. With so many benefits, course capture is here to stay, now the only uncertainty is how it will be used.

A brief portion of Prof. John Faragher’s Mediasite course capture from last spring of can be seen below (Microsoft Silverlight player required). Click the the play button to begin the presentation. Use the “Enter Full Screen” button in the upper right-hand corner to expand the presentation as well as other controls on the right-hand side to customize the view. Closed captioning can be enabled by clicking the “CC” button at the bottom of the player.

For full coverage of this session, please click the video below
(note a slight delay upon initial playback):

Arts Special Collections in the Classroom

In the past year, more students have entered the classroom of the Haas Arts Library than the study room. Special Collections librarians Jae Rossman and Molly Dotson joined us this Tuesday to discuss what this means, what is available through the arts library, and how instructors have taken advantage of the Haas Special Collections.

The first step to integrating special collections into the classroom is to realize that the arts library has been reorganized to make accessing materials as straightforward as possible. A fifteen seat seminar room with its own projector has been installed on site, and rather than being spread throughout different libraries and exhibition spaces, the arts Special Collections are all housed either in the Haas library or offsite at the library shelving facility (LSF). Although items at the LSF must be requested online, a location filter for “Haas Special Collections” on Orbis will still find them, making searching for relevant materials more intuitive.

The materials available through special collections are diverse. The Haas Arts Library holds materials from the Faber Birren Collection of Books on Color to a leading collection of bookplates and the Arts of the Book Collection on the history of printing and typography. With a focus on accessibility, the classroom and related materials are not only used by Yale instructors, but also by other regional universities and even the local arts magnet high school. Classes that use the space – over 40 in the past year – typically invite an arts librarian to comment on the materials being discussed. Having the classroom inside the library also allows for a degree of student interaction with sensitive materials that would be impossible otherwise – for example, students are allowed to touch and look at parts of the collection relevant to their work.

Jae and Molly decided to use three classes to highlight the value of special collections resources in teaching, two in Yale College and one at the graduate level. Jessica Helfand’s freshman seminar “Studies in Visual Biography” has a session in the Haas library where students are able to interact with relevant collections. Richard Rose’s college seminar “Art of the Printed Word” uses the Arts of the Book collection to give students exposure to historical bookmaking in advance of their final project of making a book.

Anna Craycroft’s graduate course on painting represents a creative use of special collections where research itself

is treated as an artistic process. In order to accomplish this, the Yale database of finding aids is used, where each set of finding aids is engaged as a depiction of a possible way to think about the collection being represented.  Once again, by having a session in a place where students can have simultaneous access to the database and to the materials themselves, course objectives can be achieved that would otherwise be very difficult.

The Haas Arts Library’s Special Collections can suit a variety of needs, and Jae and Molly pointed out the class programs can be tailored to match the level of the students being taught.  Since librarians are almost always involved in the class, students may also explore the collections from a perspective they may not have been exposed to previously.  Even with the growing number of classes using the Special Collections, librarians are still happy to help professors set up a class at Haas.  For more information, email Jae Rossman directly, at least a week in advance of the desired classroom date.  For frequently asked questions, see the Special Collections’ access policies page here.

For full coverage of this session, please click the video below
(note a slight delay upon initial playback):

Blogging Dante’s Comedy: Beatrice in the Tag Cloud

Carol Chiodo presenting at TwTTOver the past two weeks TwTT has peered behind the user interface to reveal the technology at work in the production of digital editions and the analysis of text databases.  This week, Carol Chiodo, a PhD candidate at Yale’s Italian Language and Literature program, was Virgil for the session, guiding us through the other side of academic technology, and showing how a deceptively simple blog can prompt students to truly engage with a text and work together in the production of unique and meaningful criticism.

Why a blog?

Teaching Dante’s work, Carol points out, is a bit of an economics problem.  The task itself is daunting – many universities will split the minor works and Divina Commedia into two semesters.  Meanwhile, meaningfully “engaging” the text can easily lead to a dissertation.  The resources available are also limited – the class must be accessible to students with background in neither the subject nor the original language, and Giuseppe Mazzotta’s Dante in Translation is taught in a single fall term with two 75 minute lectures and a 50 minute discussion section weekly.  With such serious time and information constraints a successful presentation depends not only on the ability to convey information understandably, but also on encouraging students to spend time considering, analyzing, and discussing the text – the essence of meaningful engagement.

The key to a successful class therefore becomes taking advantage of all available teaching resources.  Two in particular stand out to Carol: the campus technology infrastructure, and other students.    Rather than looking at the Yale WiFi network as the Facebook-bringing bane of the classroom, Carol encourages educators to see a communication platform where professors and teaching fellows can bring students together – both with each other and the text – providing an alternative to the shallow insights of Wikipedia or SparkNotes.  Peer interaction is also crucial.  By encouraging discussion and analysis beyond the classroom, students begin to think more critically about the works and a space is created for engagement that is always open, always communal, and always subject to discussion and peer review.

Professor Mazzotta’s class is itself no stranger to technology.  In 2004, a CMI2 project helped to bring multimedia and more efficient presentation into the classroom.  Then, in 2008, the class was made available through the Open Yale Courses project.  Nonetheless, the focus on students that the blog would provide would be new for the class, and even though it was obvious that they were using other enhancements, how much they would embrace the blog was an open question.

What happened?

Carol’s section was the only one of the four sections to use a blog, set up with help from the Yale ITG, and her students were some of the most successful in the class.  She admits that initially it was difficult convincing students to make and tag entries, particularly since no additional course credit was being given, but that in a very short time the worth of the blog became evident and students became enthusiastic about writing entries and commenting on the work of peers.  Students who were reluctant to comment verbally in the seminar room would sometimes be prolific posters on the blog, and their ideas were opened to consideration.

In short order, three advantages of the blog became clear, which eventually led to superior final papers.  First, students were obligated to constantly formulate, express, and defend ideas in writing.  Second,tagging allowed for a visual representation of what students were finding and considering important, allowing themes to be easily traced over time.  Finally, students were able to build on each other’s ideas to set up a body of analysis that by the end of the course was complete enough to serve as the sole secondary text for the term paper.

Beyond the aforementioned benefits, Carol points out that the blog actually improved writing quality by bringing peer review and accountability into every week’s discussion and the term paper.  She notes that her own involvement in the blog, after getting it running, was minimal and she rarely needed to comment or moderate.  Blog posts were shown on the classroom projector every section, encouraging students to submit meaningful and timely entries.  A web format made commenting and review easy, which led to a care in writing that was reflected in the final paper.  One third of her students submitted term papers of such a caliber that they were submitted to the Dante Society’s undergraduate competition.

What next?

Carol’s presentation reveals how a blog can be used to unite two resources that are frequently underutilized in a lecture class – the internet and peer interaction.  By exploiting this untapped potential, students can be encouraged to not only aggressively engage texts but also generate and defend interpretations in the face of peer review.  The ultimate product, she argues, is a demonstrably better term paper and a superior grasp of the material than would be possible in a more passive environment.

For full coverage of this session, please click the video below (note a slight delay upon initial playback):

video platform video management video solutions video player

EMiC and Making Digital Editions

Dean Irvine talks about Editing Modernism in CanadaLast week’s TwTT session explored some of the impressive analytical tools that can be applied to archives once they become machine readable.  This week, Dean Irvine, visiting Yale from Dalhousie University in Nova Scotia, gave an engaging and technical talk on the crafting of the digital edition – the process that takes printed text to hyper text while adding layers of functionality.  By the end of the talk attendees were exposed not only to leading content management systems and workflows, but also educated on the steps involved in the production of a digital edition, warned of the obstacles most frequently encountered, and given tips on where to begin the process and what kind of advantages a digital edition can provide.

Screenshot of the Editing Modernism in Canada websiteIrvine’s work is associated with the “Editing Modernism in Canada,” or EMiC project, a 33 university endeavor with roots in the classroom.  EMiC uses new technology to stimulate interest in Canadian modernist studies, and part of the program is an annually updated summer course in creating digital editions using the newest available technology.  The yearly changes are unveiled at a workshop, at Yale in May 2012 this year, before the program begins.  Dean’s TwTT presentation gives a bit of a sneak peek into what will be revealed at that workshop and what the latest developments are in the production of digital editions.

Before delving into the production of digital editions, a little should be said about what a digital edition is and what makes it special.  A digital edition is an edited electronic version of a printed work that has been enhanced with annotations and metadata tags to increase the usability and value of the text.  Digital editions can exist as text only (content, no form), which has the advantage of being easily searched, read, and annotated, but loses the original image and character of the work, a particular loss when dealing with letters, handwritten manuscripts, and other historical documents where structure and form are at least as significant as content.  “Digital page” editions also exist, which are just scanned images (form, no content) that are easy to produce and accumulate but difficult to use.  What EMiC found was that students wanted the best of both worlds – an easily searchable and annotatable text that preserves the original form and structure – and this led to the creation of the “image based edition” (preserves form and content).  Dean points out that the best way to think of the image based edition is as a multi-layered object.  Looking at the screen, the person sees an image of the document, but behind the screen the computer can read, search, and mark up the text, delivering a level of usability that is simply not possible in a printed work while preserving the aesthetic quality that is frequently lost in internet editions.

Screenshot of the Islandora websiteThe path of converting a scanned page to a “digital page” was the next topic of discussion, and Irvine walked the audience through the steps of crafting a digital edition, the first of which is to select a content management system.  A CMS is a tool to manage collections and workflows, and two are frequently used in the production of digital editions by EMiC students: Omeka and Islandora.  While Omeka (colloquially called “wordpress for scholars”) is easier to use and may be a good answer if your institution has limited support for the digital humanities, Islandora is a more powerful solution that combines a Drupal front end with a Fedora-Commons back end, and is the one used by EMiC, the Smithsonian, and others.  While both are used in the presentation of digital editions, it should be noted that they are not limited to electronic books, but are in fact scalable systems that can also accommodate photographs, audio, and other mixed media.  The CMS can be thought of as the library or exhibition space that will set the rules and format of your exhibition and also hold the content.  An example of the possibilities of using an Islandora system can be seen here.

Once a CMS has been selected, Dean pointed out that the process of actually making a digital edition is designed to be deceptively simple.  Uncompressed TIFF files are uploaded to the system, the user fills out a few metatags, and presses an “ingest” button.  Hiding behind the ingest button is a flowchart of processing and formatting handled by the computer that does not fit onto a single slide.  The changes made that are relevant to the user are the recognition of text characters (OCR), restructuring in XML (a language that holds content) and then XSLT (a language that describes structure).  Images are converted to the .jpeg format to improve compatibility and decrease file size.  Text is encoded to be compliant with the leading standard for humanities text encoding, known as TEI.  These steps are complex, and introduce some issues that Irvine discusses later, but for the moment, what the end user sees is an output document that has been tagged and is ready for markup.

At the moment text and image markup is not integrated in the same software package.  For image markup, a tool called the IMT is used, available with an open source license here.  For text, a number of editors are available, but he teaches the use of an experimental XML editor called CWRC-writer.  Although it will eventually be released as an open source application, for now it is under development and people affiliated with Yale or EMiC interested in using it should contact Dean directly.  Some other interesting solutions for markup exist, including the TILE environment from the University of Maryland, which allows for image and text markup, but relies too heavily on users programming their own plug-ins to be widely accessible.

After markup, the user has a digital edition of a text that is accessible via a content management system – so what comes next?  For one thing, the user is free to take advantage of the legion benefits of a digital text.  Tagging and metadata addition make it easier to find related works and to understand the content of a collection.  Software tools also exist to make data come to life.  An example is Juxta, which allows users to collate and compare electronic texts, identifying and saving differences in long texts.  The applications for this kind of data are manifold, including this example, which exposes changes in Darwin’s The Origin of Species over its various editions to reflect changes in scientific thought.  An upcoming project at the University of Toronto will examine the changes made to the complete works of Shakespeare over time.  Digital editions are becoming easier to make, and add a level of depth and open doors to analysis not possible with print editions.

Although the advantages of the digital edition are plentiful, and there is significant interest in making printed texts available in edited electronic form, there are many challenges that confront scholars trying to make the leap to digital.  The first is that while packages have advanced and become more usable, there is still a significant amount of technical overhead involved in setting up a collection of digital editions – enough to scare away people who may be uncomfortable in a terminal environment or with a command based text editor.  A second problem is in the lack of a unified software solution.  As reflected above, many different packages come together to produce a digital edition, some of which run on different operating systems, which can add steps and headaches to the process.  Finally, licensing can be an issue.  If ABBYY is used as an OCR engine, universities have to pay for a page quota that can quickly be exhausted through beginner error.  A switch to completely open source software has been completed at EMiC, but is still not universal.  Irvine pointed out that his group is seeking to address these issues with the ultimate goal being a completely cloud based solution that is no harder to use than WordPress, but that this solution may still be in the future.  For this reason he emphasizes the importance of making training and programs available, since many people will be willing to participate in digitization process if they receive help and guidance.

Dean Irvine’s TWTT talk opened the black box of the virtual collection and digital edition, showing not only the features of enhanced texts but also how users can create digital editions themselves.  Next week’s TwTT talk will expand on adding depth to virtual texts with Carol Chiodo’s presentation “Blogging Dante’s Comedy: Beatrice in the Tag Cloud.”  See you there!

For full coverage of this session, please click the video below (note a slight delay upon initial playback):

How to do Your Own Topic Modeling

In the first Teaching with Technology Tuesday of the fall 2011 semester, David Newman delivered a presentation on topic modeling to a full house in Bass’s L01 classroom.  His research concentrates on data mining and machine learning, and he has been working with Yale for the past three years in an IMLS funded project on the applications of topic modeling in museum and library collections.  In Tuesday’s talk, David broke down what topic modeling is, how it can be useful, and introduced a tool he designed to make the process accessible to anyone who can use a computer.

What is Topic Modeling and How is it Useful?

David introduced topic modeling as an “answer to information overload.” In short, it’s a system to have a computer automatically search and categorize large archives, combing them for patterns that can eventually be used to get a better idea of what’s inside.  The process works best when there are thousands to millions of documents involved, and the output can be thought of as a list of subject tags, although that description is not completely accurate.  As the computer sifts through the documents, it identifies words that repeat and words that co-occur.  It then identifies sets of these “tokens” and groups them together.  The result is a list of keyword groups that link to the documents that contain those keywords – a form of AI subject classification.

Although the computer can never be quite as creative or accurate as a human reader, it compensates in sheer volume – making topic modeling perfect for large data sets.  As books are scanned and archives digitized, topic modeling provides a fast way to help collections managers figure out what they are holding, and gives researchers better metadata to quickly find what they need.

Applications of topic modeling are diverse.  The NSF uses topic modeling to figure out what subjects are most active in publications, helping to produce “field surveys” that assist in funding decisions and understanding the state of research.  Historians can use topic modeling to try to identify changes in the historical record over time.  Social scientists may wish to identify trending topics on social networks.  Creative humanists can even model long books, although David concedes that the output, even in a long text divided by pages, can vary in quality.  At Yale, topic modeling is being applied to art metadata in the Haas Art and Architecture library in an effort to make collections more accessible to researchers.  With all of the applications of the technology, aspiring topic-modelers will be glad to know that Dr. Newman has helped to produce a piece of open-source software that makes the process accessible to anyone.

DIY Topic Modeling with the Topic Modeling Tool (TMT)

While topic modeling has applications in diverse disciplines, the amount of intensive computer work involved scares away many academics who could potentially benefit from the technique.  For this reason, the tool presented by David focuses on keeping the process simple and automated, allowing the researcher to spend more time analyzing and less time typing.

Accessible here, David’s software (called simply the “topic-modeling-tool”) is a graphical user interface for an existing open source project called mallet, which is included in the download and does the behind the scenes heavy lifting.  Written in java for maximum portability, the TMT allows users to import text files, either as files in a folder or as a single giant text file, set a few options for how they want topics identified, specify how many topic categories they want produced, and a few minutes later, get out both HTML and CSV formatted results with both the topics generated and the list of documents containing those topics.

Instructions and sample files are given on the website, and the options are intuitive enough to allow users to “learn by playing,” but David gave us some tips on how to approach topic modeling projects with the TMT.  Users should expect to increase the number of output topics if they wish results to be more precise.  For example, if trying to identify documents that discuss music, 10 topics should be sufficient.  If trying to differentiate between types of music, 20 topics may be necessary.  Results can also be made more specific through the use of stopwords, which are ignored by the computer as it models the documents.  This can be used to cut down on word “polluters,” for example, text that may appear frequently in by-lines.  Thresholds for tagging can also be set to increase the resolution of results, for example, the document must repeat the key text at least five times in order to be tagged.

In addition to being easy to use, TMT is not limited to English, and can process any language with clearly delimited words, including languages that use Cyrillic or Arabic alphabets.  Unfortunately, some East Asian languages pose a challenge as the computer has difficulty distinguishing between tokens.

What Next?

David’s presentation exposed some of the uses for and tools of topic modeling, and the TMT opens up this powerful system of analysis to almost anyone.  As some audience members pointed out, however, the greatest difficulty of of topic modeling arguably comes from getting the data one wishes to analyze in a usable form.  Yale has a number of resources to help with this challenge, including an upcoming workshop on using the open source package R in conjunction with Google Documents for data mining, and also next week’s TwTT workshop which will include information on how to work with large archives in the humanities.

For full coverage of this session, please click the video below (note a slight delay upon initial playback):

2011 Digital Humanities Student Poster Session

The Collaborative Learning Center was pleased to host Yale’s first digital humanities student poster session in Bass Library room L01 as the penultimate Teaching with Technology Tuesday of the spring 2011 semester. Robin Ladouceur of ITG gave a brief introduction of our convener, Kristjiana Gong (CLC intern and American Studies major).

After some brief remarks, Kristjiana first introduced Laura Wexler (Professor of American Studies; Professor of Women’s, Gender and Sexuality Studies; and Co-Chair of the Women Faculty Forum at Yale). Wexler noted that she was speaking on behalf of herself and Inderpal Grewal (Professor and Chair of Women’s, Gender and Sexuality Studies) as teachers in the fall 2010 course WGSS 380, “Gender, Sexuality, and Popular Culture”, the source of some of the projects shown. She thanked Yianni Yessios of ITG for his presence in the course as a digital artist and teacher of the humanities lab portion of the course. The projects shown represented the completion of an assignment to create a digital street “somewhere other than here, some time other than now,” with an emphasis on using Yale University Library resources and on primary sources in particular. In particular, Wexler highlighted that the students brought an inspiring, impressive, and energizing “force of creativity” to their projects.

Our next panelist was Jessica Pressman (Assistant Profesor of English), introduced by Kristjiana. Pressman stated that she was pleased to show the positive results of teaching with technology, and echoed Wexler’s comment about students’ force of creativity and imagination. Her courses often center on, as she describes it on her website, “how technologies affect our understanding of literature, both in terms of aesthetics and reading practices.” Students in Pressman’s fall 2010 ENGL 391, “Digital Literature“ course were assigned the challenge of creating a web-based analytical essay. This avant-garde format extended her teaching about form and how form and content are inextricable. Put another way, student projects needed to embody and to discuss how content is presented and the reasons for presentational choices.

Finally, Kristjiana introduced Julie Dorsey (Professor of Computer Science). Dorsey is one of the founders of the Computing and the Arts major at Yale as well as of Creative Consilience of Computing and the Arts. In the Computing and the arts major, students take all the required courses for the Computer Science major and select a track in the arts (e.g. music or theater) to weave into their computing scholarship. Student projects showcased today were from senior class majors, demonstrating an interdisciplinary fusion researched, learned, and forged over their tenure at Yale.

Student projects showcased were very impressive. Among those featured:

  • A multimedia walk down a street in Pontochō district of Kyoto in 1958.
  • A hypertext with Blue Hyacinth as its starting point, composed of two sets of four paragraphs that can be shown independently and with integrity, or remixed on the fly in mousing over it.
  • An interactive map of Jamaica during emancipation (1834/1863), set in Google Earth and drawing heavily on images from Yale’s digital collections. Included a guided tour through the created world.
  • A complex game and game platform, “The Groov Cosmos,” involving elements of strategy gaming, combat gaming, puzzle gaming, and in-play musical adjustments, created in C#.
  • A web-based digital essay analyzing and building on the work of The Jew’s Daughter and Blue Hyacinth to create a destabilized text locating meaning in chunks below the discourse level. Added a game aspect by allowing user to re-arrange text in apparently the correct order (or, rather, the original order), but this is a mirage.
  • A close reading of the use of sound in three works of digital literature: Sooth, Nippon, and Project for Tachistoscope. The project also incorporates the tactic of close-writing, borrowed from the aesthetics of Sydney’s Siberia by inserting sound into a piece that was originally silent.
  • “All Roads Lead to Toads,” an interactive fiction that tries to capture the feeling of a branching structured game, taking the emphasis off of the completion of either puzzles or the game and placing it on exploring actions, environments, and characters.

Mapping James Joyce’s Ulysses

Abe Parrish and Sam Alexander standing in front of a projected image of a map of Dublin

Abe Parrish of the Map Department and Sam Alexander, a graduate student in English, joined us this week to discuss their digital interactive map of James Joyce’s Ulysses *. The project germinated as a component of an undergraduate seminar taught by Pericles Lewis, who approached Abe with the idea in August 2010. Once it reached beta stage, Sam brought the map to the seminar students for a test drive: part building, part analysis.

For the major work, the project needed just one primary source (a Yale-owned 1900 Dublin map) and one piece of software (ESRI’sArcGIS Viewer for Adobe Flex). Digitizing the map created a raster image, requiring subsequent conversion to a vector file for use in ArcGIS Viewer. The vectorized map was then layered over native ArcGIS topographic and street maps of contemporary Dublin and broken up into queryable blocks, allowing students to locate, mark, and annotate narrative locations in the historical image.

What sets ArcGIS Viewer apart from other programs like Google Earth is that it is browser-based, with the capacity for multiple users to view and edit geographical data at once and without downloading uncommon software or plug-ins. (Viewing the project does require the use of Adobe Flash, which effectively eliminates Apple mobile devices.) Traditional mapping software requires individuals to create discrete edits to maps and then merge the documents, rather than enabling collaborative editing.

Sam offered his students three choices for working with the map as part of the seminar.  The first option–building a stable base layer of data to which future users could add–would advantage the long-term, but be less interesting to current students.  The second option was to focus on the socio-political context of the novel such as locations of the famous Phoenix Park Murders.  The third option, which Sam recommended against, was to actually map the narrative of Ulysses.

Though his students ultimately picked the idea he recommended against, Sam believes that the decision to use the map to trace the narrative of Ulysses was ultimately the right decision. Mapping the novel forced a new type of engagement, and is unlike any scholarship currently available. To his knowledge, there are no other large-scale, detailed maps of Ulysses.

Reframing Ulysses in terms of its geography, Sam explains, can spatialize the temporal events of the novel. By using the mapping tool, even with the story’s forward-moving plot, readers were able to imagine Dublin all at once as if it were laid out before them, an aim Joyce avowed publicly. The map provided an entry point for students into the complicated character psychology of the novel. By seeing on the map which buildings and streets were influencing a character, students could better understand what might be pulling the thoughts of the character in specific directions. Perhaps seeing a tea house makes a character think of “the East,” or proximity of disparate ethnic communities makes a character meditate on immigration. Physical context clues offered by mapping the novel provide an often clearer and certainly richer engagement with the novel. Even moments where Joyce’s narrative seems to not make geographical sense (on one occasion, a character crosses a street in the opposite direction of the stated destination), interesting questions emerge. Did Joyce get it wrong? Was Joyce attempting to show how the character changes his mind, first moving in one direction and then another? Such geographical details are occluded from readers without intimate knowledge of the setting and provide new contexts for interpretation.

After incorporating around 80 events of the novel into the map, Sam asked his students to engage in an analysis of the project, offering suggestions for future use. One of the epistemological issues that emerged related to the choice of the word “event” for the points marked on the map. This decision was problematized when, for instance, characters thought of places (do you mark the character’s location or the thought-place location?) or perceived remote occurrences (mark the site of perception or of occurrence?). Some challenges were technical: The search tool might benefit from some tweaking and the number of metadata fields for each event was overly ambitious. Not all of the addresses are extant; this cross-temporal anomaly added layers of interpretation, as students looked for landmarks like Nelson’s Pillar, only to learn that it was destroyed by former IRA volunteers in 1966.

Opportunities remain to enhance the program by tracing routes, adding additional layers, and augmenting the program for growth. Students suggested attaching the map to a wiki and making it compatible with smartphones, selling an app to tourists on Bloomsday tours. We’re eager to see the project evolve in years to come, and curious about the archival challenges its growth might present!

For full coverage of this session, please click the video below (note a slight delay upon initial playback):

The CUNY Academic Commons: Building the Social University

Matthew Gold talks about the CUNY Academic Commons

Matthew Gold and Boone Gorges, respectively Project Director and Lead Developer for the CUNY Academic Commons, joine us today to discuss creating and maintaining a scholars’ social network using BuddyPress. (We’re currently trying to work towards implementing a similar system as a way of interconnecting the nearly 500 blogs and thousands of posts currently in WordPress.)

CUNY Academic Commons was launched in 2009 with three goals: connect people across campuses, create a space in which faculty members and grad students could create content, and encourage exploration and discovery. Because of the diverse population and geography of the CUNY system, there was an administrative need to create an integrated, connected, networked university. The academic commons grew out of a committee composed of two representatives from each of the colleges (faculty, staff, administrators, graduate students) and was charged with creating an academic technology commons to determine the best practices of teaching with technology.

After much discussion, they determined that they did not want an institutional repository (no social networking!), a perfect taxonomy, a hard sell to potential participants, or traditional model of tech support (hoping for a more do-it-yourself approach). Conversely, they were certain they wanted openness (in the ethos, mode of development, and access), an organic system, and decentralization.

Starting with the capacities already available in WordPress — author-focused, published content (blog posts) and commentary dependent on those posts —Academic Commons grew by adding the BuddyPress plugin, opening up the world of groups, profiles, and a media wiki. Going beyond blogs, forums, and documents, groups link people through profiles (much like Facebook). Groups ranging from focus on academic subjects (such as digital humanities or use of games in the classroom) to social subjects (such as Pizza in New York) have emerged to create powerful interactions on the website. Sorting devices through the News Feed enable filtered content (much like a Twitter feed), allowing users to look at the information sharing around the network most pertinent to individualized interests. The media wiki enables collaborative editing of documents as well as historical and meta-discussions about the site.

Matt and Boone argued that one of the most important features of the growth and functionality of the site is transparency in development and support. Boone emphasized the need to have porous boundaries between users and support at all levels, focusing on regular communication with the community through the “Feedback” tab located on the right side of every page. Primarily, the feedback tab is used to report (and fix) bugs, but it also enables members to suggest content and vote on others’ suggestions (much like Reddit or Digg). While some communication methods have added work, such as Boone’s development blog, some have added community (read: free) assistance, helping to track, edit, and test new developments. Ultimately, all of these functions are meant to incorporate the entire community into the building of the Commons so that users are engaged in creating a warmer community.

One of the missing pieces of the site, however, is the lack of available support for incorporating CUNY’s approximately 250,000 undergraduates. For the future, there is hope that Academic Commons will be able to include not only undergraduates, but also functions to engage content from Blackboard. Furthermore, Matt and Boone are happy (and eager) to talk with other universities interested in pursuing similar efforts — like Yale!

Interested in more?  See Matt and Boone’s powerpoint.


For full coverage of this session, please click the video below (note a slight delay upon initial playback):