In November 2018, Diane Shaw, Katie Mika and Siobhan Leachman attended WikiCite 2018 in Berkeley, CA. WikiCite is a Wikimedia initiative that aims to develop a database of open citations and linked bibliographic data. Since sources are the foundation of Wikipedia’s claim to authority, WikiCite is working to build a repository of bibliographic data that is open, structured, and separable. WikiCite also draws from the academic community in which citation data (most often from peer-reviewed articles) is crucial for creating and linking knowledge. At the outset, WikiCite aimed to build this repository specifically for use in Wikipedia and other Wikimedia Foundation projects. Since the first meeting in 2016, this idea has grown into building a universal repository of sources to serve the sum of all human knowledge, leveraging Wikidata as its infrastructure.
Now in its third iteration, the conference included extended and short presentations, numerous lightning talks, strategy tracks, tutorials, data modelling, and a hack day (renamed a Do-athon Day to be more inclusive of those who aren’t computer science experts). The purpose of the conference, open primarily by invitation to a group of approximately 100 attendees from across the United States and abroad, was to work towards a “vision of creating an open repository of bibliographical data to support the citation and fact-checking needs of Wikimedia projects, and possibly, to serve as an open infrastructure for research, education, and information quality across the web.” Librarians were well-represented among the attendees, as were linked open data advocates, software engineers, data scientists, and active members of the Wikimedian community.
Unlike some of the other very popular Wikimedia Foundation products (Wikipedia in 303 languages; Wikimedia Commons hosting image, audio, and video files; Wikisource for full text transcriptions; and Wikidata for storing structured data, among others), WikiCite has no real online presence itself, but has grown as a community of practice dedicated to the support of a bibliographical metadata commons enabling reliable, verified linked open data connections both among Wikimedia projects and externally with many other online catalogs, databases and reference sources for a variety of libraries, galleries, archives, museums, scientific institutes, and other similar organizations, including OCLC, the Internet Archive and the Biodiversity Heritage Library. We are not exaggerating when we say it was awe-inspiring to be part of this dedicated group harnessing technical knowledge, bibliographical expertise, and lots of creativity and imagination in an effort to bring about a future of linked open data serving all kinds of scholarly information needs.
What WikiCite hopes to make possible goes beyond a simple structure for providing and sharing linked open data: attendees at the conference were also experimenting with new and improved ways to discuss, analyze, curate, vet and annotate sources online.
Dario Taraborelli, Director of Research at the Wikimedia Foundation, gave a terrific overview of WikiCite in his opening talk: Here Be Dragons: Uncharted regions of the bibliographic commons. Other talks and projects that would be of particular interest to libraries include presentations about projects at the National Library of Sweden and the National Library of Wales using structured linked open data for name and subject authority files and to help document the history of national book trade; Linked Data for Production (LD4P), a collaborative pilot project using BIBFRAME for creating structured linked data with connections to Wikidata, the Virtual International Authority File (VIAF) and WorldCat; initiatives to use Wikidata, VIVO and Scholia to generate scholarly profiles; and ways to incorporate Wikidata training into library education, including classes on Library Carpentry.
Katie, a former BHL National Digital Stewardship Resident from Harvard’s Museum of Comparative Zoology, and Siobhan, a citizen scientist and linked open data champion from New Zealand who has been a devoted transcriber of natural history materials in the Smithsonian Transcription Center, gave a talk on WikiCite and the Biodiversity Heritage Library. They described some of the unique challenges for heritage literature and metadata, and demonstrated how open access citations, images, and details gleaned from BHL and other open natural history digital repositories are applied to Wikimedia Foundation projects to support essential documentation of scientists, literature, and rare and endemic species.
Unlike modern scientific literature, much historic literature does not have original digital identifiers (DOI’s) attached. These identifiers enable a particular publication to be identified and provide a persistent link to its location on the Internet. Without these identifiers it is difficult to ingest historical bibliographic data into Wikidata in bulk. If historic bibliographic data is not able to be ingested or reused, scientists are unable to make use of this important resource. Siobhan expressed her frustration that much of the citation metadata of historic biodiversity literature is not currently found in Wikidata and urged the attendees to help resolve this problem.
Siobhan’s key takeaway from the conference was the various discussions on how to solve the difficulties of getting bibliographic information on historic biodiversity literature into Wikidata. Also of interest were the discussions on how book data is modelled in Wikidata (see https://www.wikidata.org/wiki/Wikidata:WikiProject_Books) and discovering a practical new tool to help disambiguate authors within wikidata: Author Disambiguator. Katie enjoyed the opportunity during the summit day’s events to discuss the potential for a bibliographic commons to support citations and information provenance across the web and in support of Open Science practices. On the Do-a-thon Day, Diane spent time discussing approaches for modelling the properties of holotypes on Wikidata with Siobhan, biophysicist and open scientist Daniel Mietchen, and Terry Catapano of Plazi and UC Berkeley’s Bancroft Library.
The opportunity presented by WikiCite for libraries and public knowledge creation cannot be overstated. As of November 2018, there are over 21 million publication items in Wikidata, which accounts for 40% of the entire database. More than 160 million Wikidata statements use the property “cites” (P2860) to connect items. Wikidata is open to contribute to, edit, reuse, and extend. Not only can we leverage Wikidata for metadata enrichment and linked data connections, but we can build bibliographic applications on top of this open database, track the provenance of citations, and study patterns of information reuse. Wikicite advocates for an open destination to more easily discuss, analyze, curate, annotate, and vet bibliographic resources used in Wikipedia and across the web. As connectors to information, libraries have staff who can offer a tremendous amount of expertise for modeling difficult kinds of data (like books), while ensuring equitable access to information.
The easiest way to get involved is to edit! Explore Wikidata using reasonator and SQID (“squid”). Head over to Youtube to watch talks and tutorials on Wikidata and SPARQL. Match some Wikidata items with their external identifiers using Mix ‘n Match. And join the wikicite-discuss mailing list to join conversations about adding bibliographic data to Wikidata and WikiCite program updates.