Q&A with Jordan Patterson
BY NQ
October 2019
In tandem with NQ’s Wiki-edit-a-thon, Jordan Patterson guides us on a brief tour of the wonderful world of Wikidata.
Can you tell us a little about yourself?
I’m a cataloguing and metadata librarian at Memorial University of Newfoundland, but I hail from southern Ontario. In my spare time I like to read and make spreadsheets. I’m going to be a dad in March! I’m going to start reading with my baby right away, but I will save spreadsheets until later.
How did you become so interested in Wikidata?
Through my work at the library I became interested in linked data, a method of structuring information by relating pieces of data to each other to create meaning. It’s difficult to explain how it works or its effects without it sounding like magic. The ideas behind linked data were first proposed by Sir Tim Berners-Lee [the English engineer and scientist known as the inventor of the World Wide Web] for what he envisioned as the next evolution of the internet. The internet allowed us to link from website to website, but the actual information just sat on top of each web page and required a human to read and interpret it. Sir Tim envisioned joining all of the information itself into a web, so that we’d have not just a web of pages, but a web of data. With data structured in this way, we can use complex queries to retrieve complex answers from the dataset. It also opens the door to computers being able to make inferences, connections between pieces of data without human intervention. When Sir Tim explains it, it is quite simple, but it proved to be quite a sophisticated job to realize his vision, and there were many technological barriers to your average person getting involved. To cut a long story short, Wikidata caught my attention because it removed so many of these technological barriers to linked data.
And, ah, what is Wikidata?
Wikidata is a massive knowledge base that is open and accessible, sharing the same Wiki philosophy of the other Wiki-projects. It’s a community-driven initiative that anyone can contribute to and use. I refer to it as a knowledge base and not a database because with linked data, every piece of data is related to every other piece of data, so we don’t just have a store of decontextualized bits of information, we have a store of facts, statements that carry meaning. Importantly, it’s a store of facts that a computer can delve into to explore and create new connections between things.
To give you some idea of the scale of Wikidata, we can start by looking at the more familiar Wikipedia. We all agree that Wikipedia is a massive encyclopedia, and it currently contains about 5.95 million articles on its English site. On Wikidata, there are currently over 70 million entities representing real world things like people, places, and objects, and Wikidata users have related these entities to each other to create over 700 million statements about these entities, 700 million facts about our world: everything from the bite force quotient of the Tasmanian devil to the birthplace of Wayne Gretzky.
Each entity has its own page, just like a Wikipedia article, but instead of a prose description, it will be a list of statements. To take a simple example, Wayne Gretzky will have a statement that he was born in Brantford, Ontario. The fact isn’t simply written out, however; it’s encoded, expressed in a particular syntax so that both humans and computers can “understand” the statement, and to join this statement into the vast web of information around it.
How can people access it? There are a few answers to this question. The first is to head to wikidata.org and start reading through the excellent tutorials that explain this tricky subject in a simple way. It’s easy to get started by heading to an entity’s page and beginning to manually input statements, or even just to use it as a simple reference tool for, say, the vital statistics of a person.
But to really use Wikidata to its full potential you must use what is called SPARQL (pronounced “sparkle”). SPARQL is a language with which we can make an inquiry of the knowledge base by constructing a question out of a combination of known and unknown variables. Asking a question with SPARQL will retrieve a set of data, looking pretty much like a spreadsheet. How that spreadsheet looks, and what data it contains, is up to you. It takes some practice to get this language right, but once you do, Wikidata is yours to command. You can ask, what are the world’s largest cities with a female mayor (https://w.wiki/3Ba)? Whose birthday is it today (https://w.wiki/Aaf)? What Viennese composers wrote works in E minor (https://w.wiki/7wP)?
There is one more answer to this question that you may not expect, and it is that most people already access Wikidata every day. One of the important things about linked data is that it can be reused in different contexts, so Wikidata’s data turns up in all sorts of places online. In the most prominent use-case, Wikipedia funnels data from Wikidata to automatically populate infoboxes in many articles. You’ll also see linked data in action (though not necessarily from Wikidata) any time you search Google for a person, place, thing, or class of items (like Canadian politicians) and relevant infoboxes appear alongside your list of search results.