The use of Named Graphs to enable ontology evolution

This is a re-publication of a Position Paper given at the W3C Workshop on the Semantic Web for Life Sciences, Cambridge, Massachusetts, USA.  27-28 October 2004, since the original is not presently available on the Web, and the paper contains some ideas I did not wish to get lost.

[A second related paper from that era, republished as a blog post on the Ontogenesis Knowledge Blog on 22nd January 2010, is entitled Ontologies for Sharing, Ontologies for Use by David Shotton, Chris Catton and Graham Klyne.  This is available at http://ontogenesis.knowledgeblog.org/312.]

The use of Named Graphs to enable ontology evolution

Position Paper, W3C Workshop on the Semantic Web for Life Sciences, Cambridge, Massachusetts, USA.  27-28 October 2004

Chris Catton chris.catton@zoo.ox.ac.uk and David Shotton david.shotton@zoo.ox.ac.uk

Image Bioinformatics Lab, University of Oxford, Dept. Zoology, South Parks Road, Oxford OX1 3PS, UK

[These were the e-mails and postal address of the authors at the time, but they are longer valid.]

Position: The dangers of static ontologies

The role of an ontology is to facilitate the understanding, sharing, re-use and integration of knowledge through the construction of an explicit domain model, thereby helping to address many of the difficulties currently experienced in managing large distributed on-line information resources.

With the volume of digital data now estimated to be doubling every month, soon the only way to handle much of this new information will be through the presuppositional ‘spectacles’ of an ontology. Already people look less and less at raw data, and as the volume accumulates few if any of us will have the time or the mental capacity to assimilate the new data, structure them in a meaningful way, and extract information without first processing the data through an ontology or some other similar machine-based organisational aid. This creates a potential ‘paradigm trap’, first identified by Duncan Davidson (Davidson, 2002). The philosopher Thomas Kuhn first used the term ‘paradigm’ to define the way we perceive, think about and value the world, based upon a particular vision of reality. He argued that changes to the current paradigm, “together with the controversies that almost always accompany them, are the defining characteristics of scientific revolutions” (Kuhn, 1966). There is a danger that building and using an ontology may fossilize the current paradigm in a particular field of knowledge, so that only information that fits the paradigm is actually ever seen by the user. Such an outcome would not itself halt scientific progress, since incremental knowledge that fits the current paradigm would still accumulate. It could, however, hinder or possibly even prevent the discovery and exploration of new and uncharted territory. Even in less extreme cases it is essential that ontologies can evolve as a field of study develops.

Factors favouring static ontologies

We perceive several inter-related influences that favour the fossilization of current domain paradigms into static ontologies:

  1. Ontology building currently requires a high level of dedication and understanding, and consequently ontologies tend to be built by small communities of dedicated ‘monks’. These are usually led by ‘abbots’, relatively senior domain experts who are likely to be highly committed to encapsulating the dominant paradigm, and who may resist change.
  2. Ontology building is a time-consuming and expensive exercise, and thus substantial logistic problems confront any newcomers wishing to involve themselves in such activities.
  3. Ontology building quite rightly encourages the development of community consensus. There would thus be massive social pressures against anyone wishing to create an alternative ontology for use in an already populated domain.
  4. The first ontology to be created for a particular domain of knowledge may assume a monopolistic position that becomes virtually unassailable, even if it has universally acknowledged weaknesses in its structure.
  5. If a large volume of legacy data has been encoded with a successful ontology, this will make it difficult to introduce change.
  6. Most ontologies currently under development have both good bits and bad bits, and users typically select the bits they want and ignore the rest. They may thus use a subsection of Ontology A to encode publications data, a subsection from Ontology B to encode personnel information, and most of Ontology C to categorise their biological results. As ontologies become widely used it is possible to imagine that the conceptualisation of a domain will be encoded not in a single ontology, but in a mosaic made up of segments from a number of different ontologies. This ability to pick and choose sections from a set of ontologies mitigates against the commonly held view that ontologies will evolve through a competitive ‘survival of the fittest’. There is no single ontology that can ‘succeed’ or ‘fail’ as a result of competitive selection pressure. For this reason, we believe that ontologies are unlikely to evolve in response to the same market forces that drive the development of applications software.

Possibilities for change within the present system

When and how does the conceptualisation of a domain change? Kuhn argued that “Just because it is a transition between incommensurables, the transition between competing paradigms cannot be made a step at a time, forced by logic and neutral experience” (Kuhn, 1966). This is a view that has wide and influential support, for example from Max Plank who wrote “a new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it” (Planck, 1949).

That viewpoint might be interpreted as speaking against the very possibility of reflecting a paradigm shift within an ontology. However, we believe this not to be the case, as outlined in the following sections. We distinguish two forms of paradigm shift, which we will call evolutionary and revolutionary. In an evolutionary shift, a significant new body of knowledge is appended to an existing conceptualisation, while in a revolutionary shift, the conceptualisation itself is restructured.1

With hindsight, Albert Einstein’s relativity theory can be seen as an extreme example of an evolutionary shift. The old paradigm, Newtonian Mechanics, continues fundamentally unchanged. However, new restrictions are placed on it, since the laws once thought to be universal now only hold true for bodies moving at much less than the speed of light. New laws must be added to describe the case where bodies move at close to the speed of light. It is possible to imagine an ontology describing Newtonian Mechanics as a subgraph of an ontology describing General Relativity. In the new paradigm Newtonian mechanics is not wrong, but now applies under a limited set of conditions. There is a fundamental difference between this and the Copernican revolution. The Copernican revolution is an example of a revolutionary paradigm shift. The old laws no longer hold true. The Earth is not the centre of the universe. The old laws must be removed from the ontology, and replaced with a new set of laws to accommodate the new paradigm.

Evolutionary change of an ontology

As an example of how an evolutionary paradigm shift might be represented in an ontology, consider the graphs shown in Figs. 1 to 3. Fig. 1 shows a section of an ontology describing the development of adult mammalian bone marrow and brain, constructed according to the paradigm of twenty five years ago, when the consensus was very clearly that bone marrow developed from mesoderm and brain developed from ectoderm.

Subsequently, it was shown that adult mouse brains contain haemopoietic stem cells. Since it seemed unlikely that adult bone marrow stem cells would cross the blood-brain barrier to enter the adult brain, it was hypothesised that the brain cells were derived from foetal haemopoietic cells that entered the brain tissue before the barrier was established. (Bartlett, 1982). This proposal is reflected within the ontology given in Fig. 2, which is an extension of the graph shown in Fig 1.

Fig 1. Ontology of the dominant paradigm circa 1980. Haemopoietic cells develop only from mesoderm and neural and glial cells develop only from ectoderm. In this and subsequent figures, solid arrows represent owl:allValuesFrom restrictions – e.g. neural cells can develop only from ectoderm – while dashed arrows are owl:someValuesFrom restrictions.

Fig 2. Ontology of the new paradigm post 1982, reflecting the hypothesis that brain haemopoietic stem cells have been derived by the migration of foetal haemopoietic stem cells of mesodermal origin. This challenges the dominant paradigm that brain tissues are derived exclusively from ectoderm.

Fig 3. Ontology of the emerging paradigm post 2000, after it had been shown that adult haemopoietic stem cells of mesodermal origin can migrate into the brain and there develop into neural cells.

 

More recently, it has been shown by Brazelton et al. (2000) that haemopoietic stem cells from adult bone marrow can develop into neural cells in adult mouse brain. This striking demonstration of the migration potential and developmental plasticity of adult stem cells both challenges the assumption of Bartlett (1982) that haemopoietic stem cells do not cross the adult blood-brain barrier, thereby throwing in doubt Bartlett’s conclusion that they must have entered during foetal life, and, more fundamentally, negates the long-held belief that neuronal cells can only develop from embryonic ectoderm. An ontology that reflects these new findings is shown in Fig. 3.

Now imagine that the graph in Fig. 1 is part of a much larger developmental anatomy ontology. What should be the response to papers that challenge the accepted paradigm, given that many challenges to dominant paradigms are subsequently proven to be erroneous in some way? Kuhn would argue that it would be inappropriate to make a change, since “.. by ensuring that the paradigm will not be easily surrendered, resistance guarantees that scientists will not be lightly distracted and that the anomalies that lead to paradigm change will penetrate existing knowledge to the core”. But if the ontology is to be employed to assimilate the results in the first place, how can experimental scientists test their hypotheses without changing the ontology?

The ontology change from Fig 1 to Fig 2 does not present a serious problem for the experimental scientist working on a problem that provokes a crisis in the paradigm. She can simply create a new ontology of her own describing the subdomain in question, import the dominant ontology into it and add the appropriate links between the two. If this combined ontology fits the experimental data better, we would expect it to gather support, and eventually to be accepted as the consensus view. The new ontology succeeds by subsuming the old. The change from Fig 2 to Fig 3 creates a more serious problem. The ontology in Fig 2 is no longer a subgraph of Fig 3, since neural cells no longer develop only from foetal neuroepithelium. However, in practical terms, the old ontology may still be required to interpret a mass of legacy data. Furthermore, it is easy to see that although the present system can permit limited expression of evolutionary paradigm change, it is likely to lead to a haphazard collection of ontologies within which some subset of classes and properties embody the old paradigm, while others embody the new world view, with no obvious mechanism for distinguishing between them using currently available OWL constructs.

Requirements for creating evolvable ‘living ontologies’

The work of Brazelton et al. generates a revolutionary paradigm shift, where the old ontology no longer holds true. The present system does not permit easy organic development of ontologies in a manner that could accommodate such a change. We thus believe that it would be useful to create a mechanism that would enable ontologies both to evolve in a controlled manner by the accretion of new classes and relationships, and to accommodate revolutionary paradigm shifts while remaining to some degree backward compatible.

A key part of building such evolvable ‘living ontologies’ lies in creating the potential for making clearly defined changes. What is required is a mechanism not only for importing subgraphs from existing external ontologies, but also for replacing subgraphs within the ontology of interest with new versions that reflect the new paradigm, while at the same time marking the original subgraphs in such a manner that they remain available for the (re)interpretation of legacy metadata previously created using them.

Such a mechanism would allow users to state explicitly the differences between the current paradigm and the proposed new conceptualisation in a way that would allow the two conceptualisations to co-exist, avoiding an unmanageable proliferation of separate ontologies. This may theoretically be possible using current OWL constructs such as owl:versionInfo. However, owl:versionInfo can only be applied at the class level or the ontology level. Neither of these alternatives is the appropriate level of granularity to meet the needs discussed here. Rather, to allow such changes to the conceptualisation of a domain, we need to be able explicitly to select a subgraph from an existing OWL ontology, and to name it and define its properties.

A strategy for the required management of subgraphs within ontologies is made possible using Named Graphs (Carroll et al, 2004). Named Graphs are currently being proposed as an alternative to reification in RDF, and as such they address a number of issues associated with adding metadata to data. We propose that Named Graphs be employed to permit the addition of provenance and other metadata to subgraphs within an ontology. Of course, not all changes to an ontology will represent paradigm shifts, and users may need to know whether changes to relationships in an evolving ontology reflect relatively minor choices about the manner in which a domain being modelled, or more fundamental changes to the domain paradigm itself. Named Graphs would also enable such distinctions to be made.

One of the core motivations for the introduction of named graphs into RDF is to provide a framework for proof and trust. It is certainly possible to imagine this framework being easily transferred to OWL ontologies. So, for example, one user might be happy to trust the Gene Ontology completely, while another user, experimenting in an area where the paradigm is in crisis, might elect to trust the Gene Ontology with the exception of a particular subgraph defined by a single class and all its subclasses. This subgraph might eventually be replaced by a new one reflecting a new paradigm.

Conclusion

Scientific progress proceeds in a series of conceptual or technological leaps, followed by periods of consolidation. Before the introduction of ontologies into data analysis, community support for the dominant paradigm tended to restrain change. While such restraint has been acknowledged as performing a useful function, it is a blunt instrument. The introduction of defined ontologies into data analysis could make the introduction of radical change even more difficult to achieve. The solution proposed here does not solve all the problems of viewing data through the presuppositional lens of an ontology. Anomalies may still be missed simply because they do not fit the paradigm – as has always been the case in the practice of science. However, by clearly stating the positions of the dominant paradigm and the proposed changes to it, and by allowing users to give trust or confidence ratings to subgraphs, the process of change can be clearly documented. The use of Named Graphs to identify and describe subgraphs within existing ontologies potentially provides a powerful mechanism for clarifying differences between the dominant and emergent paradigms. As Francis Bacon observed “Truth emerges more readily from error than from confusion” (Spedding et al., 1896).

References

Spedding, J. Ellis, R. L. & Heath D. D. (eds.) (1905) The Works of Francis Bacon. G. Routledge & Sons, London.

Bartlett, p. (1982) Pluripotential hemopoietic stem cells in adult mouse brain. Proceedings of the National Academy of Sciences USA79: 2722-2725.

Carroll, J. J., Bizer, C., Hayes, P. and Stickler, P. (2004) Named Graphs, Provenance and Trust. Online at http://www.hpl.hp.com/techreports/2004/HPL-2004-57.

Davidson, D. (2002) The Mouse Atlas – an ontology for mapping gene function data to the mouse embryo. Proc. Conf. on Standards and Ontologies for Functional Genomics (SOFG). Hinxton, Cambridge, November 17-20, 2002.

Kuhn, T. S. (1996) The Structure of Scientific Revolutions, 3rd edition, p. vii. University of Chicago Press, Chicago, Illinois.

Planck, M. (1949) Scientific Autobiography and Other Papers. Philosophical Library, New York.

Brazelton, T.R., Rossi, F.M.V., Keshet, G.I., Blau, H.M. (2000) From marrow to brain: Expression of neuronal phenotypes in adult mice. Science 290: 1775-1779.

1    Kuhn would argue against the distinction we make here. In his view, all paradigm shifts are revolutionary, since they require a fundamentally different view of the world, and the appearance of evolution, when it is present, is simply a post hoc revision of what actually happened during the course of a paradigm shift. We do not intend to challenge this view, but for the purposes of illustration we choose to accept the revisionist position.

 

 

Advertisements
Posted in Semantic Publishing, Ontologies | Tagged , , , , | Leave a comment

Describing open access

To permit computer-readable descriptions to be made of the various types of open access publication discussed in the previous blog post, I have expanded PSO, the Publication Status Ontology, whose use is described in an earlier blog post here, by the addition of the following new individuals to the class pso:Status:

  • pso:gratis-open-access
  • pso:libre-open-access
  • pso:green-open-access
  • pso:gold-open-access.

I have also revised the definitions of the following further individuals that were already members of the class pso:Status:

  • pso:closed-access
  • pso:open-access
  • pso:embargoed
  • pso:restricted-access
  • pso:subscription-access.

This comprehensive set of terms permits the various access statuses of documents to be encoded in RDF and published on the Semantic Web.  For convenience of reference, the textual definitions that I have used for these terms (recorded as rdfs:comments in the PSO ontology), grouped in logical order, are given below.

pso:open-access

The status of a published work (typically a scholarly publication or a dataset) that is freely available via the Internet for third parties to read without payment of access or subscription fees, and (in the case of a work published under a full open-access license) that is freely available to download and reuse for any purposes including commercial ones, including modification of the original work, its integration with other material, and its re-publication, subject typically to a requirement that the original authors and the source of the original work are acknowledged in compliance with scholarly citation norms.

pso:gratis-open-access

The status of a published work which is free to read on-line, in contrast to subscription-access works, but to which licensing restrictions apply, limiting the possibilities for downloading, text mining, modification, re-publication or re-use of the published work.  The term Gratis Open Access thus signifies removal of the price barrier to view.  While both imply ‘free’ (a potentially ambiguous word), Gratis Open Access equates to ‘free as in beer’ while Libre Open Access (q.v.) equates to ‘free as in speech’.  Gratis Open Access is thus a necessary but not a sufficient condition for true Libre Open Access.   Many ‘open access’ publications by commercial scholarly publishers are only Gratis Open Access, while almost all publications by ‘pure’ Open Access publishers are Libre Open Access.

pso:libre-open-access

The status of a published work which is both free to read on-line, and to which additional usage rights apply, for example the right to text mine, make derivative works, re-use and re-publish the published work, such rights frequently being defined by application of an explicit license such as a Creative Commons license.

pso:gold-open-access

The status of a published work, typically a journal article, made available by the publisher on the publisher’s own web site for third parties to read without payment of access or subscription fees.  Gold open access has the benefit that the article is findable where you expect it to be, but licensing restrictions may limit the possibilities for downloading, text mining, modification, re-publication or re-use of the published work.  Gold open-access publication typically involves payment by the author or his/her institution to the publisher of an article processing charge (aka an author publishing charge).

pso:green-open-access

The status of a published work made available by the author, by self-archiving a version of the work for free and open public use in their institutional repository, in a central repository, or elsewhere, in parallel with publication of a subscription-access Version of Record of the work by a publisher.  The green open-access version of the work may be a preprint (the version of the article as first submitted for publication) or a postprint (the pre-publication version of the article after incorporation of authors’ responses to peer reviewers’ comments).  Its availability may have an embargo restriction imposed by the publisher of the subscription-access version of the work that prevents the green open-access version from being freely available until some substantial time after publication of the subscription-access journal issue containing that article.  A green open access work should be accompanied by a license explicitly defining usage rights, for example a Creative Commons Attribution License.

pso:subscription-access

The status of a published work, typically an article in a journal issue, that is not available to read without payment of an article access fee or a journal subscription fee for that publication.

pso:embargoed

The status of a published work that is subjected to a publication embargo, which means that the material cannot be published, or in the case of a press release that it cannot be reported on, until a particular date known as the embargo date.  For open-access journal articles, an embargoed article is one in which availability of the open-access version of the article is delayed by the publisher for a substantial embargo period, typically of six or twelve months, after subscription-access availability of the published work.

pso:restricted-access

The status of a work (typically a scholarly paper or a dataset) to which access is restricted.  For example, confidential information to which access is made available only to those who have been approved by the owner or copyright holder of the asset after personal application, or to those with appropriate security clearance, or to those within a partnership.

pso:closed-access

The status of a work (typically a private or secret paper or a confidential dataset) that is typically held unpublished in a ‘dark’ archive whose existence is unknown by the wider world, and that is only available to the owner or copyright holder of the asset.

pso:confidential

The status of a document containing information that must be kept confidential.

pso:non-confidential

The status of a document containing information that may be shared publicly.

pso:unpublished

The status of a work (for example a document or a dataset) that has not been published by the author, a publisher or a data repository.

Of relevance to these statuses are two financial terms within FRAPO, the Funding, Research Administration and Projects Ontology, previously described in this blog post: frapo:ArticleProcessingCharge and frapo:Subscription.

Posted in Ontologies, Open Access, Scholarly publishing, Semantic Publishing | Tagged , , , , , | 1 Comment

Open access journals – wheat, chaff and hopeful monsters

The Fifth Annual Conference on Open Access Scholarly Publishing, organized by OASPA, the Open Access Scholarly Publishers Association [1], was held in Riga, the capital of Latvia, on 18th-20th September 2013.  I was invited to attend to discuss the Open Citations Corpus, a repository of open bibliographic citation data initially harvested from the reference lists of open access journal articles.  This post contains my belated reflections on the major themes raised by that interesting meeting.

The conference, which was well organized by Claire Redhead, OASPA’s Membership & Communications Manager, was attended by 85 people, including representatives of some 37 publishers and of 22 other organizations including major funders, policy organizations and libraries, and a handful of other interested parties such as myself.  Both ‘pure’ Open Access publishers and more traditional publishers who publish a mixture of open-access, hybrid and subscription-access journals were represented.  The venue, the Radisson Blu Hotel, provided excellent accommodation and conference facilities, enabling us to work effectively, in five themed sessions of talks and discussion over two and a half days.

The lead keynote speaker for the Riga meeting was Lars Bjørnshauge, Director of DOAJ, the Directory of Open Access Journals, a marvelously useful information source that he founded in 2003 at the University of Lund, which since December 2012 has been under the supportive wing of IS4OA [2].

Lars started the meeting by exposing the first major theme, namely quality control in OA journals.  Given the ease of on-line publishing and the explosive growth of OA journals over the last few years, there is great current concern both in DOAJ and in OASPA to determine which of these open access journals are legitimate scholarly publications, and which are simply attempts on the part of fake or non-legitimate publishers to separate authors from their cash in the form of substantial article processing charges (APCs, aka author publication charges) in return for a mediocre service.  Lars suggested a number of criteria by which OA journals should be judged, include the following:

  • That journals should provide the names and contact details for the editorial board.
  • That journal articles should have a clear and transparent peer reviewing policy.
  • That journal articles should be published under a Creative Commons CC-By attribution license, so that readers’ rights to re-use the articles are clear.
  • That journal articles should have DOIs (Digital Object Identifiers), typically, for scholarly articles, DOIs issued to registered publishers by CrossRef.
  • That journal articles should be accompanied by machine-readable metadata.
  • That publishers should archive digital journal articles for future reference, and that this should be done in machine-readable formats such as an XML DTD such as JATS (the NISO Journal Article Tag Suite), rather than as a PDF document.

To this end, DOAJ and OASPA, together with the Committee on Publication Ethics and the World Association of Medical Editors, have recently jointly published a set of Principles of Transparency and Best Practice in Scholarly Publishing against which to evaluate all OA journals.

The second major theme of the Riga meeting, opened by Michael Jubb of the Research Information Network in the following talk, concerned the exact nature of scholarly OA publishing and its financial implications.  Michael outlined the striking progress being made in the UK to implement recent OA mandates for publication of research work funded by public purse, discussing both the Finch Report and the Research Councils UK’s Policy on Open Access.  This requires that articles be published under Creative Commons CC-By licenses, so that the content may be freely mined and otherwise re-used – the ideal situation for the reader.

This highlighted the distinction, often veiled for nominally open access publications from commercial publishers, between the two types of open access, characterised as Gratis versus Libre:

  • Gratis Open Access signifies removal of the price barrier alone, giving a right to read the article off the screen.  However, the publication is still restricted by licenses that do not permit it to be downloaded or reused in any way.
  • Libre Open Access signifies removal both of the price barrier and at least some of the permission barriers limiting reuse, giving rights to text-mine and re-use the article.

Thus, while both imply ‘free’ (a potentially ambiguous word), Gratis Open Access equates to ‘free as in beer’, while Libre Open Access equates to ‘free as in speech’, an analogy for which Peter Murray-Rust is to be thanked. Gratis Open Access is thus a necessary but not a sufficient condition for true Libre Open Access.  Many ‘open access’ publications by commercial scholarly publishers are only Gratis Open Access, while almost all publications by ‘pure’ Open Access publishers are Libre Open Access.

Michael also discussed the tension between Gold Open Access (OA article published on publisher’s web site), which typically involves payment of APCs, and has the benefit that the article is findable where you expect it to be, and Green Open Access, where the publisher permits the author to publish, in an institutional repository or similar third-party site, a preprint (the version of the article as first submitted for publication) or a postprint (the pre-publication version of the article after incorporation of authors’ responses to peer reviewers’ comments), often with an embargo restriction that prevents the article from being freely available until some substantial time after publication of the subscription-access journal issue containing that article.  The latter (Green OA) is the cheaper way for UK universities to comply with the RCUK policy, but means that potential readers may have difficulty finding the article, the content of which (if it is a preprint) may not be the same as the published Version of Record.

Representatives of other funding agencies and policy organizations similarly set forth their OA agendas.

The third theme to emerge from the conference was the contentious issue of hybrid journals, initially set forth by Liz Ferguson of Wiley-Blackwell.  Hybrid journals are scholarly journals in which one has to pay a subscription to view the majority of articles, but which may also contain OA articles for which the authors have had to pay substantial APCs.  The bone of contention is the degree to which commercial publishers should or do lower their journal subscription charges as the number of OA articles within them rises, to avoid what is politely termed “double dipping”, i.e. getting academics to pay publishers not just once but twice for the privilege of reading their own works of scholarship.  On this topic, several publishers claimed that they were doing the honourable thing.  Someone commented “The only way for hybrid is to take the publisher’s word”!

Falk Reckling of FWF (the Austrian Science Fund) told the conference of the Austrian experiment to avoid increased library payments to publishers while achieving open access publication, in which publishers were directly reimbursed for all papers funded by FWF, and the amount paid was then deducted from the subscriptions paid to them by the Austrian Library Consortium the following year.  He reported that most big publishers, including Elsevier, Wiley and Taylor & Francis, rejected that proposal.  Falk concluded by saying that only by working together internationally could funding agencies succeed in forcing change on the publishers, a sentiment echoed particularly by Cameron Neylon of the Public Library of Science.

Victoria Gardner of Taylor & Francis surprised me by pointed out that the average cost of publishing an article in the humanities and social sciences (HSS) was more than three times that in science, technology and medicine (STM), largely because of the papers were larger and had higher rejection rates.  She stated that the author-pays APC model was not financially viable for HSS, where many researchers, particularly early in their careers, lack funding income that could be used for that purpose.  There was thus tension in the debate “OA is unworkable for the Humanities” versus “OA is the future” that was not resolved.

Apart from the costs of OA, and particularly of hybrid journals, the other discussion issue concerned their future.  Are hybrid journals stepping stones to a fully open-access world, or are they hopeful monsters, doomed to extinction?  Liz Ferguson reported Wiley to have ~1,200 hybrid journals that between them had published only about 3,500 articles – on average just three articles per journal – with many journals having no OA uptake at all.  Several other publishers reported their experiences with hybrid journals.  But Georg Botz of Science Europe told the meeting flatly that “the hybrid model is not a working and viable pathway to OA.”

Apart from these main themes, there were talks from a number of leading OA publishers about their progress and achievements, and an interesting discussion about the growing trend of Open Access book publication led by Eelco Ferwerda of OAPEN, an open access library of books in the humanities and social sciences, and Cecy Marden of the Wellcome Trust, who discussed funding of OA monographs and the importance of Europe PubMed Central as a repository for OA publications.

The final morning of the conference saw presentations on a number of separate additional themes, including a “meet the OASPA members” session containing inspiring presentations by Brian Hole of Ubiquity Press, who described how quality OA publishing could be achieved at a fraction of the conventional costs discussed above, and by Lyubo Penev of Pensoft Publishers, an innovative OA publisher of journals relating to biological taxonomy, who described the Pensoft Writing Tool, an integrated online collaborative authoring, editing, reviewing and publishing platform that facilitates semantic enhancements and the publication of datasets accompanying articles (see also http://www.pensoft.net/page.php?P=31&SESID=auzykddds), that I have mentioned in a previous blog post.

It was during that session that I gave my invited talk entitled Open Citations Corpus – freeing scholarly citation data, described in a separate Open Citations Blog post.

In summary, OASPA’s Fifth Annual Conference on Open Access Scholarly Publishing revealed the world of open access journal publishing to be one of rapid growth and innovation, and of clear tensions regarding quality control, cost, licensing, and applicability to HSS.  It was clear that commercial academic publishers were attempting to hang on to established modes of publishing, funding and tight licensing while making limited concessions towards openness in the form of hybrid journals and a smaller number of OA journals under terms that benefit them financially.  The elephant in the room, never fully articulated, was the extent to which, and for how long, the commercial publishers’ traditional business model of high profit margins can withstand the competition of the newer Open Access publishers, particularly those offering radically cheaper publishing avenues for scholars.  Cameron Neylon reminded us that disruptive technologies achieve market dominance not by being disruptive but by addressing a recognised and current need, so that their adoption then drives the disruption.

Clearly for the foreseeable future we will have a transitional mixed publishing economy in which ‘pure’ OA journals will exist alongside subscription-access and hybrid journals, with a plurality of business models.  Improvements in mechanisms for micropayments and for splitting payments between multiple authors’ institutions are required.  Funders’ mandates are driving expansions of Gold and Green OA publishing and minimizing restrictions to reuse by adoption of more permissive CC-By licensing, but universities are struggling to find funds to pay OA APCs while maintaining conventional journal subscription payments.  Embargos to access and lack of interoperability between institutional repositories continue to restrict the usefulness of Green OA.  So there is real momentum towards OA, but mixed progress and great disparity between nations.  Further change is inevitable and to be welcomed.

[1]     OASPA was established in 2008 to represent the interests of Open Access (OA) journal publishers globally in all scientific, technical and scholarly disciplines.  Paul Peters from the Hindawi Publishing Corporation is currently Chair of the OASPA Board, and is editor of the OASPA Blog.   The presentations from the Riga meeting are currently available from http://oaspa.org/conference/presentations-coasp-2013/.  The 6th Conference on Open Access Scholarly Publishing (COASP) will be hosted by UNESCO at their Paris Headquarters from September 17th – 19th, 2014.  Further information will be posted on the OASPA conference page (http://oaspa.org/conference/) as it becomes available. Details of the inaugural OASPA Asia conference, to be held in Bangkok in June 2014, are presently given at http://oaspa.org/conference/

[2]     IS4OA (Infrastructure Services for Open Access) is a Community Interest Company based in the United Kingdom, led by Caroline Sutton, the immediate past-president of OASPA and by Alma Swan, a consultant working in the field of scholarly communication who has a long history of supporting open access initiatives.  IS4OA acts as an umbrella organisation for open access activities and services that are of value to the academic community and beyond, providing business structure and expertise and a means of obtaining and channelling financial support for these activities and services.

Posted in Open Access, Scholarly publishing | Tagged , , , | 2 Comments