Skip to content

Words Matter: Reconciling Museum Metadata with Wikidata

By Sharon MizotaJuly 20226 Minute Read

1*nMab20x5HPdw0g0mQqTo4w.png

Screenshot from the documentary Change the Subject, 2019. Made available by the Trustees of Dartmouth College under a Creative Commons Attribution-NonCommercial license.

From the Metadata Learning and Unlearning series



Cultural heritage aggregator Curationist is creating a taxonomy that leverages Wikidata to bring museum metadata into the 21st century.

In November 2021, the United States Library of Congress finally replaced the terms “Illegal aliens” and “Aliens” in its widely used resource, Library of Congress Subject Headings (LCSH). (The terms were replaced with the still problematic but less derogatory “Noncitizens” and “Illegal immigration.”) The Library of Congress was developed to support the United States Congress and serves as the national library of the United States. It sets the standard for the majority of U.S. libraries and archives as well as many international institutions.

LCSH is the official list of terms that libraries, archives, and museums all over the world use to describe their collections. These terms provide consistent, standardized access to the subject matter of books, articles, films, art, and just about anything held by a cultural heritage institution. The terminology change, which took nearly five years, was the result of advocacy begun by a group of Dartmouth students who led a movement challenging the terms’ anti-immigrant sentiment. This movement is chronicled in the 2019 documentary Change the Subject.

Screenshot of cover from “State of the Internet’s Languages” report. Licensed under CC NC-SA 4.0.

Words Matter

As the battle over the subject headings demonstrated, words affect the way we think about and access cultural materials and information. In library catalogs, archival finding aids, and on the internet, these keywords act as gatekeepers, rendering some stories and artifacts findable while obscuring others. If you don’t know the dominant or official words with which something is described, you are less likely to be able to access it. This becomes an issue if you are unfamiliar with the dominant culture, or simply speak a different language.

LCSH is managed by a centralized authority and adapts very slowly to changes in popular language. By contrast, crowdsourced, multilingual Wikidata has emerged as an alternative resource to which anyone with internet access can contribute. While it took the Library of Congress years to change two terms, it only takes a few keystrokes to change a Wikidata term. Institutions or content creators who describe their materials by linking to Wikidata are connecting to a vast, international repository of concepts, names, and titles that serves the same standardizing function as LCSH but is much more open to and accommodating of different perspectives, languages, and values.

Enriching Metadata

In the spring of 2022, the cultural heritage content aggregator Curationist began using Wikidata to create a custom taxonomy describing museum objects ingested into its database. The eventual goal is to fully integrate the site with Wikidata so that Curationist archivists can search it in real time for descriptive terms. Building on the site’s guidelines for using Wikidata as a controlled vocabulary, the Curationist Taxonomy is an interim step in this process, providing a smaller, controlled vocabulary that arises from the activity of the archivists. The archivists can then use it to maintain the consistency of the descriptive terms they use, and the terms will be mapped to Wikidata for eventual integration.

The process began with a list of terms collected by the archivists. Curationist archivists are tasked with enhancing metadata from museums and archives that has been ingested into the Curationist site. This might mean adding subject terms where there are none, or adding an Indigenous geographic place name where only a settler colonial name is provided. In this way, Curationist serves as a platform for making metadata more inclusive and rich.

Screenshot of OpenRefine showing terms matched with Wikidata and their equivalents, descriptions, and identifying numbers.

The initial list of terms was run through the reconciliation process in OpenRefine. OpenRefine is a powerful tool for viewing, cleansing, and transforming metadata. It has a built-in integration with Wikidata, so it was relatively easy to find the Wikidata matches or equivalents for the terms. For a step-by-step guide to the reconciliation process, see Reconciling the Curationist Taxonomy with Wikidata.

Screenshot of part of the Cultural Context section of the Curationist Taxonomy Google Sheet.

Once terms were matched, they and their corresponding Wikidata labels, descriptions, and identifiers were added to a Google Sheet. Curationist archivists use this Sheet to search for appropriate terms and to add new terms. On a weekly basis the taxonomy manager reviews any new terms, and reconciles, approves, or modifies them for use. In this way the taxonomy is a living document, like Wikidata, that grows along with the Curationist site.

Enhancing Wikidata

If a term is added that isn’t currently in Wikidata, the taxonomy manager decides whether it merits a new Wikidata entry and creates it or communicates an alternative term(s) to the submitting archivist. To date, the project has added 36 new entries to Wikidata. These have mostly been in the area of Indigenous cultures:

Akan culture (Q111725591)
Asante culture (Q111725892)
Aztec culture (Q111725927)
Beembe culture (Q111725916)
Fante culture (Q111725895)
Hidatsa culture (Q111731403)
Hopi culture (Q111731448)
Mexica culture (Q111725944)
Puebloan culture (Q111731435)
Zuni culture (Q111731440)

In most cases, these ethno-linguistic groups had Wikidata entries only for people: “Hopi people,” “Zuni people,” etc. Adding these cultural terms brings these groups on par with more dominant ethno-national groups that already had terms for both people and culture:

Americans (Q846570)
culture of the United States (Q1044835)
Chinese people (Q6501380)
Chinese culture (Q645917)
French (Q121842)
culture of France (Q1985804)

In this way, Curationist not only uses more accurate and respectful terms in its own database, it enhances Wikidata with new terms that are available for others to use. The Curationist dataset serves as a kind of “to-do” list for Wikidata engagement. And since anyone can edit Wikidata, which is accessible in 309 languages, it lends itself to an international, open education project on the power of naming.

Like LCSH, Wikidata is a map of what is conceivable and knowable, only bigger and wilder.

Although it currently reflects the dominance of the English language on the Internet, Wikidata is growing more diverse and accessible every minute. Learning how to responsibly and proactively edit it is an opportunity to explore how terms carry cultural and political meaning, and to have a say in which terms are available for use. Until recently, this power to name and select has been wielded by a small number of specialists—scholars, scientists, librarians—but with Wikidata it’s available to anyone with an internet connection and the desire to make something known.

——
The Metadata Learning and Unlearning series was originally published on Medium.com and edited by Sharon Mizota, Virginia Poundstone, and Garrett Graddy-Lovelace. This series raises questions and makes proposals for what metadata can do to advance a broader dialogue about diverse worldviews within open education and openGLAM realms.

Sharon Mizota
Metadata Consultant

Sharon Mizota is a DEI metadata consultant who helps archives, museums, libraries, and media organizations transform and share their metadata to improve diversity, equity, and inclusion in the historical record. She has over ten years of experience managing and creating metadata for arts and culture organizations. She is also an art critic, a recipient of an Andy Warhol Foundation Arts Writers’ Grant, and a coauthor of the award-winning book, Fresh Talk/Daring Gazes: Conversations on Asian American Art.