JATS, the Journal Article Tag Suite, defines a vocabulary of XML elements and attributes used to describe the content and metadata of journal articles. In this post, I describe the mapping of JATS to RDF, so that publishers’ XML article metadata encoded using JATS might become part of the web of linked data.
This work, that makes extensive use of the SPAR (Semantic Publishing and Referencing) Ontologies, was undertaken by Silvio Peroni and myself over the past summer, guided by Deborah Aleyne Lapeyre of Mulberry Technologies Inc., Maryland, the company that wrote the JATS specification on behalf of the National Library of Medicine.
JATS was officially published on 22 August 2012 as ANSI/NISO Z39.96-2012, JATS: Journal Article Tag Suite (version 1.0), with the public URL http://www.niso.org/apps/group_public/document.php?document_id=8975.
It is the successor to the National Library of Medicine (NLM) DTD, which for many years has been a de facto standard for the XML markup of scholarly journal articles, widely used by many academic publishers within their routine publication workflows, and also used as the ingest format for PubMed Central. The last version of the NLM DTD, Version 3.0, was released in 2008, and JATS, although rebranded and now an official NISO standard, can be thought of simply as the next version of it.
As did the NLM DTD, JATS contains three tag sets, the Journal Archiving and Interchange Tag Set, the Journal Publishing Tag Set, and the Article Authoring Tag Set, intended for slightly different purposes. The Journal Publishing Tag Set is a moderately prescriptive Tag Set, optimized to regularize and control the sequence of the XML content.
As regular readers of this blog will know, the Resource Description Framework (RDF) is the key enabling technology for the Semantic Web, also know as the web of linked data. By defining statements about entities and their relationships in RDF syntax using publicly available ontologies, such statements can be combined into interconnected information networks (RDF graphs) in which the truth content of each original statement is maintained, thereby creating a web of linked data, the Semantic Web.
The SPAR (Semantic Publishing and Referencing) Ontologies are a suite of complementary and orthogonal OWL 2 DL ontologies described in this and subsequent posts in this blog. These ontologies were created specifically to permit RDF descriptions of bibliographic entities, citations, reference collections and library catalogues, the structural and rhetorical component parts of documents, and roles, statuses and workflows in publishing.
Mapping JATS to RDF
Silvio Peroni and I decided to map the metadata elements of the JATS Journal Publishing Tag Set to RDF. For this purpose, we were given pre-publication access to JATS v1.0, enabling us to undertake our mapping work in July, and to revise it during August and September 2012. I present this work at the 2012 JATS Conference at the National Library of Medicine on the NIH campus in Bethesda, Maryland on 16th October 2012.
In our mapping of JATS, we employed the SPAR ontologies as appropriate, and also used elements from other well-known vocabularies such as the Dublin Core Metadata Initiative (DCMI) Metadata Terms and the Friend of a Friend (FOAF) Vocabulary.
Since JATS is a large standard, containing 246 elements and 134 attributes, we chose to map only the JATS metadata entities that describe an article (e.g. <journal-meta> for metadata about the journal in which the article was published, such as the name of the journal), and to leave aside (possibly for a later mapping exercise using DoCO, the Document Components Ontology) those entities describing the textual and graphical structure and content of the article (e.g. <title>, <body>, <fig>, <table>).
Detailed description and examples
Within the JATS2RDF mapping document we created, our mappings are presented in tabular form, with one table for each of the principle JATS metadata elements <article>, <article-meta>, <journal-meta>, <contrib>, and <ref-list>, plus their principle contained elements and attributes. In all, we created 242 separate XML to RDF mapping statements.
Within each table there are three columns, the first giving the JATS element or attribute name, the second showing an exemplar XML usage from the JATS specification, and the third documenting our mapping of that usage to RDF.
We provide specific examples of each mapping, giving alternatives where appropriate. To show the style employed, the mappings for the first four elements in the JATS <ref-list> element table:
Automating the mapping to RDF
By means of an Extensible Stylesheet Language Transformation (XSLT) transform that we have also created (downloadable from http://purl.org/spar/jats2rdf/xslt), this JATS2RDF mapping now permit the JATS metadata elements and their attributes, from documents marked up in XML using the NISO-JATS Journal Publishing Tag Library v1.0, to be converted automatically to RDF, enabling this information to be published to the Semantic Web as linked open data in a manner that is unambiguous and universally understood.
We hope that this ability to express JATS metadata descriptions in RDF will promote the use of JATS to a wider community.
Publications arising from this work
Our paper describing our JATS to RDF mapping is available from the National Library of Medicine:
Peroni S, Lapeyre DA and Shotton D (2012) From Markup to Linked Data: Mapping NISO JATS v1.0 to RDF using the SPAR (Semantic Publishing and Referencing) Ontologies. Proc. 2012 JATS Conference, National Library of Medicine, Bethesda, Maryland, USA, 16-17 October 2012. http://www.ncbi.nlm.nih.gov/books/NBK100491/.
Our JATS2RDF mapping document, entitled JATS2RDF (v1.2): a mapping of JATS metadata to RDF, is available in PDF format from http://purl.org/spar/jats2rdf.
Our XSLT transform to automate the creation of RDF metadata from a JATS-marked up document is available from http://purl.org/spar/jats2rdf/xslt.