Wednesday, January 03, 2007

How public should scientific data be?

Phillip Cassey and Tim M. Blackburn, Reproducibility and Repeatability in Ecology, 56 BioScience 958 [subscription required] argue against the trend toward requiring deposit of raw data in a publicly accessible database as a condition of publication in scientific journals. Their position is that papers should include sufficient explanations to allow others to evaluate and reproduce the work. They object to demands for raw data because they regard that data as the "intellectual property" of the researchers who generate it. Although they acknowledge the potential usefulness of data to other researchers, they believe that "is not a valid scientific reason to demand the publication of raw data," at least not if publication demands are unevenly applied. Their primary concern is that "being required to give away . . . hard-won data for no return . . . has the potential to significantly hinder scientists’ careers."

Cassey and Blackburn are, of course, not the first to raise this concern. Together with worry about the costs of responding to individual requests, fear of loss of intellectual entitlements underlies the scientific community’s resistance to the Shelby amendment, a 1998 appropriations rider which made data generated with federal grant funding subject to FOIA demands.

Unfortunately, data hoarding has real costs, not only for the scientific enterprise but for information-intensive policy choices. Putting aside its value for detecting fraud, publication of raw data facilitates meta-analyses, syntheses of datasets, and the application of new analytical methods. At the outer frontiers of scientific understanding, where so many environmental and natural resource management decisions must be made, this sort of additional data crunching has value to society, not just to competing scientists. It is not a sufficient answer to say that nature remains available to all, so that other researchers are free to gather their own data. Ecological data may not in fact be uniformly available, especially on private property. More importantly, gathering ecological data is time- and resource-intensive. Unnecessary repetition drains already-inadequate scientific funding to little purpose. Nor is it sufficient to say, as Cassey and Blackburn implicitly do, that the original researchers can be relied upon to seek collaborators who will help them extend their findings. The transaction costs of collaboration can be very high, particularly where it extends across established disciplinary lines. Furthermore, it seems likely that researchers who are already inclined to hoard their data will fear that prospective collaborators are out to steal their thunder. Disputes about primacy of authorship or other divisions of credit may substantially increase barriers to collaboration.

One solution is to use sticks, such as the Shelby amendment or disclosure requirements conditions of publication. In this context, however, sticks are not likely to work as well as carrots. Data made public will be only as useful as they are well maintained, documented, and organized. The cooperation of the depositor is therefore critical to the value of the deposit.

Disclosure incentives need to counteract the incentives for hoarding. Those incentives are clear: researchers want to milk every possible publication from their data because that is what they believe will advance their professional standing. I do not mean that to sound flip; the pressure to publish is real, as all academics are aware. The incentives of the academic research process need to be adjusted to better mesh with the information needs of modern society.

Perhaps what is needed is simply a different measure of the importance of research. Traditional measures, numbers of papers in prestigious journals and numbers of citations, don’t adequately account for the value of datasets. Concerns about responsibility for authorship preclude putting the names of those responsible for assembling a dataset on every paper to which the data contribute. But it should be possible to require acknowledgment of data use, and to electronically track those acknowledgments, much as citations are currently monitored. Data use ought to be a strong indicator of the impact of research.

That might not be enough to counter the tendency of untenured scientists to hold data close to their vests, because the publication of spin-off papers from dataset deposits is likely to show an even longer time lag than citations. Therefore, it may also be necessary to develop a norm of peer evaluation of the value of disclosed datasets as part of the tenure process, and perhaps also as a feature of grant application review.

Because I am not a practicing academic scientist, I recognize that these suggestions may be naive. The details are less important at this point than triggering a robust discussion about the costs and benefits of data disclosure, for individual researchers, the scientific community, and society. Without such a frank discussion, we are unlikely to stumble on better ways to align the incentives of researchers with the needs of the larger communities.


Post a Comment

<< Home