The purpose of mapping DataCite metadata elements to ontology terms is to enable DataCite metadata to be published in RDF as Open Linked Data, enabling these metadata to be understood programmatically and integrated automatically with similar data from elsewhere.
In the previous blog post, I described the updated and expanded version of the DataCite Ontology, version 0.6.1, that I created with Silvio Peroni to conform to the DataCite Metadata Kernel, v2.2 published in July 2012. The revised ontology now provides the eleven classes and five properties required to cover all the items in v2.2 of the DataCite Metadata Schema – not just the core DataCite metadata elements, but all of them – that were not conveniently covered by terms in other ontologies.
In this post, I describe how we have used this revised DataCite Ontology to create a new revised DataCite2RDF mapping document. This now replaces the previous mapping document, described in an earlier post, that was a partial mapping of the DataCite Metadata Kernel v2.0 using the original version of the DataCite Ontology.
Wherever possible in this new DataCite2RDF mapping, we have used commonly used vocabularies, including:
DCMI (Dublin Core Metadata Initiative),
FOAF (Friend of a Friend Vocabulary),
SKOS (Simple Knowledge Organization System), and
PRISM (Publishing Requirements for Industry Standard Metadata) terms,
supplemented by terms from FRBR (Functional Requirements for Bibliographic Records), and from the following SPAR (Semantic Publishing and Referencing) Ontologies:
CiTO, Citation Typing Ontology,
FaBiO, FRBR-aligned Bibliographic Ontology,
FRAPO, the Funding, Research Administration and Projects Ontology,
PRO, Publishing Roles Ontology,
SCORO, Scholarly Contributions and Roles Ontology, and, of course,
theDataCite Ontology itself.
The mapping document is structured in tabular form, with three columns: the first containing the DataCite ID, the second containing the name of the DataCite property, and the third containing the ontology entities used in mapping each of the DataCite metadata elements. All the metadata elements of the DataCite Metadata Kernel version 2.2 are included, both mandatory and optional, and both major and supplementary. For each, we provide not only the ontology terms, but also a specific exemplar of the usage of that term in an RDF statement, giving alternatives where appropriate. To show the style employed, the mappings for the first three DataCite metadata elements are shown in the following table:
|ID||DataCite property||Equivalent ontology class or property|
|1||Identifier||datacite:PrimaryIdentifier (A sub-class of datacite:ResourceIdentifier that uses a datacite:IdentiferScheme that is restricted to datacite:doi, an individual in the datacite:ResourceIdentifierScheme)Exemplar usage:
:my-dataset rdf:type fabio:Dataset ; datacite:hasIdentifier [ rdf:type datacite:PrimaryResourceIdentifier ; literal:hasLiteralValue "doi:10.1371/journal.pntd.0000228.g002.x001" ] .
|1.1||IdentifierType||Restricted to datacite:doi, an individual in the datacite:ResourceIdentifierSchemeExemplar usage:
:my-dataset rdf:type fabio:Dataset ; datacite:hasIdentifier [ rdf:type datacite:PrimaryResourceIdentifier ; literal:hasLiteralValue "doi:10.1371/journal.pntd.0000228.g002.x001" ; datacite:usesIdentifierScheme datacite:doi ] .
|2||Creator||dc:creator (data property)Exemplar usage:
:my-dataset rdf:type fabio:Dataset ; dc:creator "Shotton, David" .
dcterms:creator (object property)
:my-dataset rdf:type fabio:Dataset ; dcterms:creator [rdf:type foaf:Person ; foaf:familyName "Shotton" ; foaf:givenName "David" ] .
To facilitate our mapping, the object properties compiles and isCompiledBy, that are required for the DataCite relationType controlled list, have now been included in version 2.2 of CiTO (created 3 July 2012) as cito:compiles and cito:isCompiledBy. The use of the mini-ontology CiTO4Data, that contained only those properties, has consequently been deprecated.
In several instances, we propose alternative mappings, depending upon whether one wishes to use a data property that has a literal (e.g. text, number, date) as its object, or to use an object property that has a URI as its object. As explained more fully in the mapping document itself, our recommended best practice is to use DCMI metadata terms (dcterms:) as object properties in preference over Dublin Core metadata elements (dc:) as data properties, unless one specifically needs to use a literal as the object of an RDF triple.
A presentation related to this work, that was given at a DataCite meeting held at the British Library on 6 July 2012, is available here.
The next blog post describes a DataCite Metadata Input Form based on this new DataCite2RDF mapping that Tanya Gray and I have created. This is a Web input tool that permits easy entry of metadata compliant with the DataCite Metadata Kernel v2.2. The metadata can be saved in an XML file, and can be automatically mapped to RDF by employing an XSLT that uses this mapping.
We commend the use of this mapping to all who wish to encode DataCite metadata in RDF, and welcome feedback on this work.
David Shotton (firstname.lastname@example.org..uk)
Silvio Peroni (email@example.com)