Category: Wikidata

Powerful tool to quickly create items for publications in Wikidata

This is a great example of why I love the WikiCite community. At WikiCite 2017 a group of people decided to write a zotero translator for the Wikidata community.

Last week I had the opportunity to learn about a data archive at the Institution for Social and Policy Studies at Yale University. The archive is well-curated, and has a lot of metadata about data files that they house, such as supporting data sets or replication materials related to published papers and books that ISPS-affiliated scholars have created.

This week I read about the zotero translator and I wanted to try it out. Thank you very much, zotkat! This translator meets the needs of people who want a semi-automated way to quickly create items of publications of many varieties.

If you run this query on the Wikidata Query Service then you will be able to explore the items for publications and explore the supporting data files by following the links to where the data is stored at ISPS.

Windham-Campbell Prizes

The Windham-Capbell Prizes are awarded in Fiction, Nonfiction, Drama and Poetry. This year 8 awards were made.

Erna Brodber
Erna Brodber in 2015 by Peter Bennett , via Wikimedia Commons https://commons.wikimedia.org/wiki/File:Erna_Brodber.jpg

Erna Brodber, Andre Alexis, Marina Carr, Carolyn Forche, Ali Cobby Eckermann, Ike Holter, Maya Jasanoff, Ashleigh Young are the 2017 recipients of Windham-Campbell Prizes.

For each recipient who already had an item in Wikidata I added a statement using property P166 “award received” to connect them to the prize.

I was then able to write queries about the set of prize recipients.

  1. Who are all the winners of the Windham-Campbell Prizes in history? Try the query here on the Wikidata Query Service.
  2. How many recipients do we have images for in Wikimedia Commons? Try the query here on the Wikidata Query Service.
  3. Return a list of all winners of the Windham-Campbell Prize along with the geocoordinates of their birthplaces. Try the query here on the Wikidata Query Service.
  4. Return a list of winners with their DOBs and plot it on a timeline. Try the query here on the Wikidata Query Service.
  5. Winners of the Windham-Campbell Prize listed with all other awards received. Try the query here on the Wikidata Query Service.

I think this is a helpful example of how library bibliographic metadata could further enhance Wikidata. I would like to be able to see the metadata for each of the works created by these authors, but this data is not yet in Wikidata. Imagine what we could build for library users- or what library users could build themselves- if we could also provide bibliographic metadata from Wikidata!

Library of Congress Digital Preservation using Wikidata URIs!

The Library of Congress Digital Preservation team recently updated their inventory of Format Description Documents to include Wikidata URIs. The Library of Congress has detailed descriptions of more than 400 file formats on their website.

Wikidata QID on the TIFF format description document
This is an excerpt from the format description document for TIFF, Revision 6.0 showing the Wikidata ID.

The purposes of these format descriptions are listed on their website:

  • To support strategic planning regarding digital content formats, in order to ensure the long-term preservation of digital content by the Library of Congress, and
  • To provide an inventory of information about current and emerging formats, including the identification of tools and detailed documentation that are needed to ensure that the Library of Congress can manage content created or received in these formats through the content life cycle, and
  • To identify and describe the formats that are promising for long-term sustainability, and develop strategies for sustaining these formats including recommendations pertaining to the tools and documentation needed for their management.
  • To identify and describe the formats that are not promising for long-term sustainability, and develop strategies for sustaining the content they contain.
  • The overall analysis is part of the execution of the Library of Congress Digital strategic plannning goal pertaining to the management and sustenance of digital content.

I’m looking forward to seeing many additional cultural heritage institutions and organizations using Wikidata URIs in the future.

Wikidata is already serving as a crosswalk between identifiers. Here is a SPARQL query for the Wikidata endpoint showing all of the items in Wikidata for which we have IDs from the Library of Congress, PRONOM, and the Just Solve Wiki.

UPDATE: I updated this post on March 15, 2017 with new links to the Library of Congress websites.

 

#loveyourdata week 2017

Love Your Data logo
Logo for Love Your Data Week

#LYD17

Similar to Open Access Week, the purpose of the Love Your Data (LYD) campaign is to raise awareness and build a community to engage on topics related to research data management, sharing, preservation, reuse, and library-based research data services. We will share practical tips, resources, and stories to help researchers at any stage in their career use good data practices.

I created a series of 5 SPARQL queries that highlight Wikidata, collections at Yale University Library, and are expressions of how I relate to #loveyourWIKIdata.

Oral history at the Computer History Museum

The mission statement of the Computer History Museum is “to preserve and present for posterity the artifacts and stories of the Information Age.”

The CHM has conducted hundreds of oral history interviews, transcribed them, and made them available from their website. This set of oral histories is very rich with information and I imagine that many people interested in the history of computing might like to read the transcripts of these oral histories.

I was curious to see what data about the people who have oral histories at the CHM might be in Wikidata. You might recognize this bubble chart from my post on 11/11/2016. Well there is a new bubble on the chart now!

This bubble chart is a visualization of the number of archival materials held by each institution. The CHM is now represented by the second largest bubble!
This bubble chart is a visualization of the number of archival materials held by each institution. The CHM is now represented by the second largest bubble!

I found many of the people who contributed oral histories to the CHM in Wikidata. For those who already had items in Wikidata, I added a link to the transcript of their oral history. Now we can ask questions about these people as a group.

Using the Wikidata Query Service, I wrote a few SPARQL queries to find out more about these pioneers of computing history.

  1. Return a list of all people who have their oral history at the Computer History Museum.
  2. Return a list of people whose archives are at the Computer History Museum as part of the oral history project who have images.
  3. Return a list of all people whose archives are at the Computer history Museum along with the geocoordinates of their places of birth in order to plot on a map.
  4. Return a list of all people whose archives are at the Computer History Museum and the educational institutions they attended.
  5.  Return a list of all people with archival material at the Computer History Museum along with their employer information.
  6. Return a bubble chart of awards rec’d by people who have archival material at the Computer History Museum ranked in order of most recipients to fewest recipients.
  7. Return images & Erdos number for people who have archival material at the Computer History Museum.
Map with the birthplaces of those who contributed oral histories to CHM.
Map with the birthplaces of those who contributed oral histories to CHM.

The ability to ask questions about this group of people demonstrates the benefits of linked open data. With a few queries, we unearth all of the data that editors have been contributing about these people.

Future work:

  1. Create items for all of the people who have contributed an oral history who are not yet in Wikidata.
  2. Create statements for all of these people to make their items more complete. Sourcing statements to these oral histories themselves will help us enrich the data.
  3. Add links in Wikipedia to content from CHM since many humans read Wikipedia and fewer humans read Wikidata.

Let’s describe emulators in Wikidata!

Screenshot of the Hercules emulator
Screenshot of the Hercules emulator by User:Kungfuman

There is a very useful list page on English Wikipedia called List of computer system emulators. The page contains a lot of structured data about emulators that is not yet in Wikidata.

Tool-builder Magnus Manske created Listeria to support lists on wikis that get their data from Wikidata. Magnus wrote a blog post about the tool that explains the functionality.

I wanted to see if I could recreate some of the structured data from the Wikipedia list page using Listeria. I have been working on describing configured computing environments as part of WikiProject Informatics. I created a subpage here for my Listeria experiment.

I decided to try this out on a subpage of a Wikidata WikiProject rather than on English Wikipedia because I wasn’t sure how the experiment would unfold, or how many of the columns would have values from Wikidata that would be displayed by the Listeria list.

screenshot of list generated by Listeria
Screenshot of the Computer system emulators list generated by Listeria

Writable file formats by software application

I was wondering what the state of data about writable file formats per software application is in Wikidata. Here is a visualization of the data as of today:
writeableffbubble

What I learned from this visualization is that we need to enter more data about what writable file formats are supported by various software applications. Please let me know if you are aware of sources that list information about writable file formats that could be scraped to supply additional statements for Wikidata!

Feel free to try this query yourself here. I look forward to re-running this query periodically to see how the process of making this data more complete unfolds.

Wikidata: Archival papers held by institutions

I was curious to know how many links to archival collections are being added to items about people in Wikidata. I wrote a SPARQL query for the Wikidata Query Service to find out.

This query makes use of Property 485 “archives at” and the bubble chart visualization is one of the built-in options for data display of the Wikidata Query Service.

Bubble chart visualization showing the institutions named in P485 "archives at" in Wikidata
Bubble chart visualization showing the institutions named in P485 “archives at” in Wikidata

 

The institutions named the most often by the “archives at” property are in the largest bubbles. Yale University Library and the Beinecke are in the middle layer as of today. I would like to add enough links to archival collections at these two institutions to see this visualization change (with the bubbles for Yale and the Beinecke increasing in size) before the end of 2016.