During a discussion with librarians towards the end of last year, I was asked why they should bother to publish their catalogues as open linked data, and how that might be done.
For those of us already part of the Semantic Web world, the answer may seem self-evident – by making data available in machine-readable form under open licenses using web standards, we permits them to be integrated seamlessly with similar data from elsewhere, and allow others to re-use these data in creative ways that we have probably never imagined.
As Tim Berners-Lee has said many times, for example in his 2009 TED address, “You do your bit, everybody else does their bit, and the data all connect, creating a power that is simply not available from hyperlinking documents.”
He gave a telling example dating from 2007. In the search for new drugs to treat Alzheimer Disease, a researcher may wish to knows the answer to the following biological question:
“What proteins are involved in cellular signal transduction, and are related to pyramidal neurons?”
A Google search on that question gave ~223,000 hits in 2007, none of which provided a specific answer. However, a search over the linked healthcare data, made in collaboration with the W3C Health Care and Life Science Interest Group of which I am a member, gave 32 responses, each one of which was the name of a specific protein involved both in signal transduction and related to pyramidal neurons.
[Today, in 2013, the same Google search gave me ~413,000 hits, the top one being to a 2007 presentation by Eric Prud’hommeaux of W3C, presenting details of that healthcare data search in greater detail!]
Such results would not be possible had not many different individuals and organisations published relevant linked data on sites such as DBpedia, the Allen Brain Atlas, DrugBank, Diseasome, National Drug Code, DailyMed, RXNorm, ChemSpider, chEBI. Those sites do not necessarily have a lot in common – one is about brain anatomy, another about disease-gene disorder relationships, another about standardized drug names, and so on. However, the fact that they all adopt common web standards to represent their data means that it is possible to search across all the data and find information that is relevant.
When I was first asked those questions by my librarian colleaguem over a glass of wine, I felt that I gave rather inadequate answers. These were genuine enquiries from someone outside the semantic web world that deserved a fuller response.
Since I too, as a cell biologist, was outside the semantic web world not so long ago, and have learned what little I now understand the hard way, I thought that perhaps I was in a good position to respond to the questions sympathetically, with a positive attempt to explain technical terms, rather than assume knowledge.
I thus started to write what I thought might be a helpful explanation for this librarian. Soon the document grew to unwieldy length, as I added more and more background information to support my central explanation! So I ended up splitting it into six shorter papers, under the overall title Libraries and linked data, each of which attempts to deal with just one facet of the problem.
These papers are intended to be read in sequence, the first five really just providing background for the sixth, which addresses the central question.
They are simple, perhaps even simplistic, and are not the kind of explanations one would get from a professional computer scientist. Nevertheless, in the hope that others unfamiliar with the semantic web may find them useful, I present them in the following six blog posts, under the titles:
Libraries and linked data #1: What are open linked data?
Libraries and linked data #2: A rough guide to Turtle
Libraries and linked data #3: Encoding bibliographic records in RDF
Libraries and linked data #4: A Comparison of RDF and XML
Libraries and linked data #5: Using the SPAR ontologies to publish bibliographic records
Libraries and linked data #6: Why publish library catalogues as open linked data?
Please let me know if you find them useful, by leaving a comment, clicking “Like” below the post, or e-mailing me at <david.shotton@zoo.ox.ac.uk>. Thank you.
Pingback: Libraries and linked data #1: What are linked data? | Semantic Publishing
Pingback: Why should digital libraries use linked data? | Crafty Tails
Pingback: OpenCitations described | OpenCitations blog
Pingback: From little acorns . . . A retrospective on OpenCitations | OpenCitations blog