Research data lost to the sands of time
A study published online yesterday in Current Biology found that the availability of research study data diminishes with each passing year following study publication. The authors, from the University of British Columbia, looked at 516 articles published between 1991 and 2011 and first attempted to locate the e-mail addresses of study authors and contact them. For the e-mail addresses that led to successful contact with an author, they then asked for the study data. When making their requests, they said that the data was needed for a reproducibility study. In the discussion section, the authors noted that they may have had a higher success rate in receiving data if they had instead indicated the purpose was for an important medical or conservation project and offered co-auothorship in the resultant paper.
The researchers found that for every year that had passed since the paper’s publication date, the odds of finding an email address that led to contact with a study author decreased by 7% and that the odds of turning up the data reduced by 17% per year. The authors report that while some of the data sets were truly lost others fell more into the category of “unavailable,” since they existed, but solely on inaccessible media (think Jaz disk). These findings will not come as a shock to those who have worked in a research lab. This publication does put some tangible numbers behind the underlying message of NYU Health Sciences Library’s excellent dramatic portrayal of an instance of inaccessible data. The authors conclude by suggesting that a solution to this problem moving forward can be found in more journals requiring the deposit of data into a public archive upon publication. I would also suggest that academic institutions can take a role by establishing policies supporting research data preservation alongside providing a data repository.
It is worth noting that the authors of this paper published their study data on Dryad.