This is a great example of why I love the WikiCite community. At WikiCite 2017 a group of people decided to write a zotero translator for the Wikidata community.
Last week I had the opportunity to learn about a data archive at the Institution for Social and Policy Studies at Yale University. The archive is well-curated, and has a lot of metadata about data files that they house, such as supporting data sets or replication materials related to published papers and books that ISPS-affiliated scholars have created.
This week I read about the zotero translator and I wanted to try it out. Thank you very much, zotkat! This translator meets the needs of people who want a semi-automated way to quickly create items of publications of many varieties.
If you run this query on the Wikidata Query Service then you will be able to explore the items for publications and explore the supporting data files by following the links to where the data is stored at ISPS.
I think this is a helpful example of how library bibliographic metadata could further enhance Wikidata. I would like to be able to see the metadata for each of the works created by these authors, but this data is not yet in Wikidata. Imagine what we could build for library users- or what library users could build themselves- if we could also provide bibliographic metadata from Wikidata!
The purposes of these format descriptions are listed on their website:
To support strategic planning regarding digital content formats, in order to ensure the long-term preservation of digital content by the Library of Congress, and
To provide an inventory of information about current and emerging formats, including the identification of tools and detailed documentation that are needed to ensure that the Library of Congress can manage content created or received in these formats through the content life cycle, and
To identify and describe the formats that are promising for long-term sustainability, and develop strategies for sustaining these formats including recommendations pertaining to the tools and documentation needed for their management.
To identify and describe the formats that are not promising for long-term sustainability, and develop strategies for sustaining the content they contain.
The overall analysis is part of the execution of the Library of Congress Digital strategic plannning goal pertaining to the management and sustenance of digital content.
I’m looking forward to seeing many additional cultural heritage institutions and organizations using Wikidata URIs in the future.
Wikidata is already serving as a crosswalk between identifiers. Here is a SPARQL query for the Wikidata endpoint showing all of the items in Wikidata for which we have IDs from the Library of Congress, PRONOM, and the Just Solve Wiki.
UPDATE: I updated this post on March 15, 2017 with new links to the Library of Congress websites.
Similar to Open Access Week, the purpose of the Love Your Data (LYD) campaign is to raise awareness and build a community to engage on topics related to research data management, sharing, preservation, reuse, and library-based research data services. We will share practical tips, resources, and stories to help researchers at any stage in their career use good data practices.
The mission statement of the Computer History Museum is “to preserve and present for posterity the artifacts and stories of the Information Age.”
The CHM has conducted hundreds of oral history interviews, transcribed them, and made them available from their website. This set of oral histories is very rich with information and I imagine that many people interested in the history of computing might like to read the transcripts of these oral histories.
I was curious to see what data about the people who have oral histories at the CHM might be in Wikidata. You might recognize this bubble chart from my post on 11/11/2016. Well there is a new bubble on the chart now!
I found many of the people who contributed oral histories to the CHM in Wikidata. For those who already had items in Wikidata, I added a link to the transcript of their oral history. Now we can ask questions about these people as a group.
Using the Wikidata Query Service, I wrote a few SPARQL queries to find out more about these pioneers of computing history.
The ability to ask questions about this group of people demonstrates the benefits of linked open data. With a few queries, we unearth all of the data that editors have been contributing about these people.
Create items for all of the people who have contributed an oral history who are not yet in Wikidata.
Create statements for all of these people to make their items more complete. Sourcing statements to these oral histories themselves will help us enrich the data.
Add links in Wikipedia to content from CHM since many humans read Wikipedia and fewer humans read Wikidata.
There is a very useful list page on English Wikipedia called List of computer system emulators. The page contains a lot of structured data about emulators that is not yet in Wikidata.
Tool-builder Magnus Manske created Listeria to support lists on wikis that get their data from Wikidata. Magnus wrote a blog post about the tool that explains the functionality.
I wanted to see if I could recreate some of the structured data from the Wikipedia list page using Listeria. I have been working on describing configured computing environments as part of WikiProject Informatics. I created a subpage here for my Listeria experiment.
I decided to try this out on a subpage of a Wikidata WikiProject rather than on English Wikipedia because I wasn’t sure how the experiment would unfold, or how many of the columns would have values from Wikidata that would be displayed by the Listeria list.
I was wondering what the state of data about writable file formats per software application is in Wikidata. Here is a visualization of the data as of today:
What I learned from this visualization is that we need to enter more data about what writable file formats are supported by various software applications. Please let me know if you are aware of sources that list information about writable file formats that could be scraped to supply additional statements for Wikidata!
Feel free to try this query yourself here. I look forward to re-running this query periodically to see how the process of making this data more complete unfolds.
I was curious to know how many links to archival collections are being added to items about people in Wikidata. I wrote a SPARQL query for the Wikidata Query Service to find out.
This query makes use of Property 485 “archives at” and the bubble chart visualization is one of the built-in options for data display of the Wikidata Query Service.
The institutions named the most often by the “archives at” property are in the largest bubbles. Yale University Library and the Beinecke are in the middle layer as of today. I would like to add enough links to archival collections at these two institutions to see this visualization change (with the bubbles for Yale and the Beinecke increasing in size) before the end of 2016.