(Note: Understanding of this paper will be enhanced by prior reading of the earlier papers in this series:
Precise semantics for markup terms
A cornerstone of the Semantic Web is the use of public ontologies to give precise and universally available definitions to terms, so that RDF statements are unambiguous in their meaning. For all its flexibility and widespread use, this is not the case in the world of XML, where markup terms can take on different meanings, depending upon who is using them, reminiscent of Humpty Dumpty’s statement in Alice’s Adventures in Wonderland :
We came across this difference in world views recently, when we were using the SPAR Ontologies to map to RDF the new Journal Article Tag Suite (JATS), published on 22 August 2012 as ANSI/NISO Z39.96-2012, JATS: Journal Article Tag Suite (version 1.0). JATS v1.0 is the successor to the National Library of Medicine (NLM) DTD v3.0, a de facto standard for the XML markup of scholarly journal articles, that is widely used both by many academic publishers within their routine publication workflows and also as the ingest and export format for PubMed Central.
For JATS, this ambiguity is by design. JATS is a descriptive, not a prescriptive model, that endeavours to capture and document the actual practice of current publishing. It does not tell publishers what they should call their content. Rather, if a term is widely used in practice, it is likely to appear in the JATS, which aims to provide a vocabulary that will be used more or less consistently across publishers. Furthermore, suggested values for JATS elements and attributes lists are just that – suggested, since JATS provides structures for recording different types of information, but does not attempt to regularize their usage.
For example, the JATS documentation describes the central element <article> as follows:
<article> ... </article> "Usage: This element can be used to describe not only typical journal articles (research articles) but also much of the non-article content within a journal, such as book and product reviews, editorials, commentaries, and news summaries."
Thus the JATS element <article> may be used to describe an XML representation of a research article, but may also be used to describe an XML representation of many other kinds of journal content, such as an editorial, an obituary, a list of events, a book review, a puzzle, a game, a quiz, an interview or a photo-essay, depending upon the meaning an individual publisher chooses for this tag element. This clearly goes beyond what the average person means by “journal article”.
In other words, the JATS standard is deliberately vague and non-committal about the meaning of many terms, because there is no intention to tell any publisher what or how much metadata to publish.
As a consequence of this ‘loose’ design, the first barrier we came up against when mapping JATS to RDF was that a JATS element might mean what its name implies, but equally it might be used by some publishers to mean something entirely different!
What people means most frequently when they use the JATS element <article> is what is defined in FaBiO, the FRBR-aligned Bibliographic Ontology, as a fabio:JournalArticle:
fabio:JournalArticle rdfs:comment "An article, typically the realization of a research paper reporting original research findings, published in a journal issue." .
However, as we have seen, it can also mean fabio:JournalEditorial, fabio:JournalNewsItem, fabio:BookReview, fabio:ProductReview, etc., all of which are journal content items. These various meanings could all be mapped to RDF generically, as follows:
:periodical-entity a fabio:PeriodicalItem ; frbr:partOf [ a fabio:JournalIssue ] .
However, that does not solve the problem entirely, since it is also permissible to use JATS <article> to describe textual entities before they appear in a journal issue, for example to describe a preprint or a revised manuscript being re-submitted to a publisher. Clearly, this brings problems for unambiguous XML mapping to specific RDF terms.
In our JATS2RDF mapping work [2, 3], our solution to this dilemma has been, where necessary, to map the entity described by <article> to :textual-entity, a resource name that is so broad that it includes all relevant possibilities, since they are all textual entities, thereby achieving semantic accuracy, if not detailed specificity.
Hierarchies versus triples
A further clear difference between XML and RDF concerns the structural organisation of items (i.e. the elements in XML and the resources in RDF). XML is able to structure elements according to a particular containment order, thus creating hierarchies of nested elements. Such a containment relation between two XML elements always carries a particular semantics, although it is not formalised and implicitly lives outside the XML schema of the language.
Let us consider the following two excerpts of JATS markup:
<article-meta> <title-group> <article-title> Dealing With Markup Semantics </article-title> </title-group> </article-meta> <element-citation> <article-title> Dealing With Markup Semantics </article-title> </element-citation>
Above, the element <article-title> is used in two different contexts, thus having two alternative interpretations.
In the former excerpt (i.e. when it is descendant of the element <article-meta>) <article-title> is the title of the article under consideration, which can be simply represented in RDF:
:textual-entity fabio:hasTitle "XXX" .
In the latter excerpt (i.e. when it is descendant of the element <element-citation>), it represents the title of another bibliographic work that is cited by the article under consideration in one of its references in the reference list of the article. This could be represented in RDF as follows:
:textual-entity-A frbr:part :reference . :reference a biro:BibliographicReference ; biro:references :textual-entity-B . :textual-entity-B fabio:hasTitle "XXX" .
This says that the citing paper “A” contains a reference that refers to the cited paper B, and that the cited paper B has the title “XXX”. Here, the title “XXX” belongs to the referenced work.
However, this is a mis-interpretation. What the original XML actually means is subtly different – namely that the title “XXX” is part of the text of the reference itself, within the reference list that makes up part of the citing paper A. To express this in RDF, the encoding has to be different:
:textual-entity-A frbr:part :reference . :reference a biro:BibliographicReference ; frbr:part [ a doco:Title ; literal:hasLiteralValue "XXX" ] .
Here, what we are saying is that part of the reference itself is a title with the string value “XXX”.
Sorting out the semantics hidden behind XML containment relationships is one of the main issues one has to address when trying to map XML schemas to RDF vocabularies correctly, because:
- RDF is not able to represent the hierarchical relation of XML elements using native constructs, since everything is described as a ‘flat’ graph of resources; and
- the semantics of the containment of XML elements, such as the aforementioned <article-meta>/<article-title> and <element-citation>/<article-title>, is neither explicitly nor formally defined – it can live either in a natural language definition of the element, or in the mind of the developer of the schema, or, sometimes, in the mind of the author of the XML document.
Of course, RDF can express hierarchical relationships, that are clearly defined by the DL logic of the ontologies from which terms are used. Thus fabio:hasShortTitle and fabio:hasTranslatedTitle are both defined as sub-properties of dcterms:title. However, such hierarchical definitions represent taxonomies, and do not address the contextual semantics of XML determined by containment relationships.
For more on this topic, readers are referred to an interesting yet simple comparison of XML and RDF made by Tim Berners-Lee, in one of his informal and highly influential Design Issues papers entitled Why the RDF model is different from the XML model , in which he attempts to answer the question “Why should I use RDF – why not just XML?”
This post is jointly authored by David Shotton, University of Oxford (firstname.lastname@example.org) and Silvio Peroni, University of Bologna (email@example.com), and is taken in part from reference .
 Lewis Carroll (1865). Alice’s Adventures in Wonderland. 2009 edition: Oxford University Press. ISBN 978-0-19-955829-2.
 Peroni S, Lapeyre DA and Shotton D (2012). From Markup to Linked Data: Mapping NISO JATS v1.0 to RDF using the SPAR (Semantic Publishing and Referencing) Ontologies. A paper describing a mapping of JATS metadata to RDF for the 2012 JATS Conference, Washington DC, USA, 16-17 October 2012. Available from http://www.ncbi.nlm.nih.gov/books/NBK100491/.
 Peroni S, Lapeyre DA and Shotton D (2012). JATS2RDF (v1.2): a mapping of JATS metadata to RDF. Available from http://purl.org/spar/jats2rdf.
 Berners-Lee, Tim (1998). Why the RDF model is different from the XML model: An attempt to explain the difference between the XML and RDF models. http://www.w3.org/DesignIssues/RDF-XML.html.