Is Data Publication the Right Metaphor? is an essay by Mark Parsons and Peter Fox to be published in the Data Science Journal, for which a preprint has been provided for open pre-publication community peer review at http://mp-datamatters.blogspot.com/2011/12/seeking-open-review-of-provocative-data.html.
To supplement the discussion this excellent essay has already prompted, I would like to add some belated comments relating to DOIs for dataset versions, peer review of datasets, metadata as first class scientific objects, and linking the data publication and linked data metaphors.
First, DOIs can easily be used to provide unique identifiers to datasets that undergo updating and versioning, as exemplified in the Dryad Data Repository, a repository for the publication of small heterogeneous biological datasets linked to peer-reviewed journal articles. Dryad DOIs permit the version numbers of each data package and of each of the data package’s constituent data files to be specified explicitly (see http://wiki.datadryad.org/wiki/DOI_Usage).
Second, relating to the problems of applying traditional models of peer review to datasets, it is important to realize a fundamental difference between a journal article and a dataset, namely that the journal article is an exercise in rhetorial persuasion, with the authors presenting selected data to convince the reader of the correctness of particular hypotheses – see, for example, papers on this topic by Anita de Waard et al. [1-3] – while datasets lack this rhetorial structure, and thus cannot be judged by the same standards of logical persuasiveness, fit between data and hypothesis, etc. Rather, datasets contain ‘mere facts’, albeit organized according to some underlying data model, and thus their review and quality control have to be “more like an audit that assures that a data set adheres to best practices of documentation, format, error characterization, etc.”. For this reason, citation links between datasets and the journal articles upon which they are based, as routinely given in the Dryad Data Repository, are of particular value, since the latter provide the peer-reviewed contextual framework within which to undestand the data.
Third, as mentioned by John Milner in the discussion of the essay at https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1112&L=RESEARCH-DATAMAN, there is increasing recognition that the metadata describing a dataset should themselves also be recognised as independent first class scientific objects, with the possibility of publishing such a metadata file with its own DOI, either as a supplementary file within a data package, or as a ‘metadata paper’ within a ‘metadata journal’.
Finally, one way of linking the ‘data publication’ metaphor with the ‘linked data’ metaphor is to ensure that such metadata describing datasets are made available not only in human-readable form but also in machine-readable form, by publishing them as RDF, either as a separate named graph with a unique URI or embedded as RDFa within the human-readable metadata document.
Hope these comments help.
 de Waard, A., Breure, L., Kircz, J.G., Oostendorp, H. van (2006). Modeling Rhetoric in Scientific Publications. Current Res. in Inf. Sci. and Techn. pp. 352-356.
 de Waard, A., (2007). A Pragmatic Structure for the Research Article, in: Proceedings ICPW’07: 2nd International Conference on the Pragmatic Web, 22-23 Oct. 2007, Tilburg: NL. (Eds.) Buckingham Shum, S., Lind, M. and Weigand, H. Published in: ACM Digital Library & Open University ePrint 9275.
 de Waard, A. and Kircz, J.G. (2008). Modeling scientific discourse – shifting perspectives and persistent issues, ELPUB2008. Open Scholarship: Authority, Community, and Sustainability in the Age of Web 2.0 – Proc. of the 12th Int. Conference on Electronic Publishing, June 2008, Eds. L. Chan and S. Mornati, pp. 234-245.