Category: file formats

Library of Congress Digital Preservation using Wikidata URIs!

The Library of Congress Digital Preservation team recently updated their inventory of Format Description Documents to include Wikidata URIs. The Library of Congress has detailed descriptions of more than 400 file formats on their website.

Wikidata QID on the TIFF format description document
This is an excerpt from the format description document for TIFF, Revision 6.0 showing the Wikidata ID.

The purposes of these format descriptions are listed on their website:

  • To support strategic planning regarding digital content formats, in order to ensure the long-term preservation of digital content by the Library of Congress, and
  • To provide an inventory of information about current and emerging formats, including the identification of tools and detailed documentation that are needed to ensure that the Library of Congress can manage content created or received in these formats through the content life cycle, and
  • To identify and describe the formats that are promising for long-term sustainability, and develop strategies for sustaining these formats including recommendations pertaining to the tools and documentation needed for their management.
  • To identify and describe the formats that are not promising for long-term sustainability, and develop strategies for sustaining the content they contain.
  • The overall analysis is part of the execution of the Library of Congress Digital strategic plannning goal pertaining to the management and sustenance of digital content.

I’m looking forward to seeing many additional cultural heritage institutions and organizations using Wikidata URIs in the future.

Wikidata is already serving as a crosswalk between identifiers. Here is a SPARQL query for the Wikidata endpoint showing all of the items in Wikidata for which we have IDs from the Library of Congress, PRONOM, and the Just Solve Wiki.

UPDATE: I updated this post on March 15, 2017 with new links to the Library of Congress websites.


#loveyourdata week 2017

Love Your Data logo
Logo for Love Your Data Week


Similar to Open Access Week, the purpose of the Love Your Data (LYD) campaign is to raise awareness and build a community to engage on topics related to research data management, sharing, preservation, reuse, and library-based research data services. We will share practical tips, resources, and stories to help researchers at any stage in their career use good data practices.

I created a series of 5 SPARQL queries that highlight Wikidata, collections at Yale University Library, and are expressions of how I relate to #loveyourWIKIdata.

Writable file formats by software application

I was wondering what the state of data about writable file formats per software application is in Wikidata. Here is a visualization of the data as of today:

What I learned from this visualization is that we need to enter more data about what writable file formats are supported by various software applications. Please let me know if you are aware of sources that list information about writable file formats that could be scraped to supply additional statements for Wikidata!

Feel free to try this query yourself here. I look forward to re-running this query periodically to see how the process of making this data more complete unfolds.