Provocative Ideas from Science Commons Symposium Pacific Northwest
What’s the best way to spend a warm, sunny, February Saturday in western Washington? If you answered “by sitting indoors and watching presentations about data” you may be… correct! This past Saturday, about sixty scientists and librarians gathered on the Microsoft campus for Science Commons Symposium – Pacific Northwest. We had the privilege of hearing from some of the world’s most prominent thought leaders in the areas of open data, open access, and what web technology means for the future of scientific communication. Here are a few eyebrow-raising ideas from the symposium.
Practicing science is a privilege, not a right.
Cameron Neylon, a biophysicist at ISIS in the United Kingdom, kicked off the day by describing a day in the life of a research scientist– himself. He emphasized that practicing science is not a right, it’s a privilege. Much scientific research is publicly funded; therefore, scientists are accountable to the public. They should make the data they collect and the results of their experiments as widely available as possible. Neylon went on to say that “you don’t need a sledgehammer to take down a snowman” and that, sometimes, formal publication is overkill. There are simpler, faster, and less expensive ways to share information. However, these simpler, faster systems must be technically and legally interoperable in order to really improve communication.
We need to move from an environment of trust to one of proof.
Jean-Claude Bradley is an Associate Professor of Chemistry at Drexel University, leader of the UsefulChem project, and coiner of the term Open Notebook Science (ONS). He demonstrated the challenge involved in using the Web to find a solubility coefficient for a common substance. It seems like this should be easy, but it’s not. Sources that are supposedly authoritative sometimes contain conflicting information. The assumptions behind the numbers are not always made explicit. As a result, experiments are done and re-done unnecessarily. UsefulChem, created on a free, hosted, wikispaces wiki in 2005, makes the usually hidden side of lab work accessible to anyone who does a Web search. ONS participants record detailed lab notes in the wiki and post raw experimental data as it is obtained. The wiki records a history of exactly which changes were made, when, and by whom.
Video (starts at 1:34) – http://content.digitalwell.washington.edu/msr/external_release_talks_12_05_2005/18174/player.htm
Crowdsourcing can work for science.
Antony Williams is the Vice President of Strategic Development at the Royal Society of Chemistry (UK). He has an interesting hobby, which is to lead the development of ChemSpider, a free online community of chemists. The goal of ChemSpider is “to aggregate into a single database all chemical structures available within open access and commercial databases and to provide the necessary pointers from the ChemSpider search engine to the information of interest.” Williams showed us yet another example of an error in an authoritative online chemistry reference and how that error was picked up and proliferated across the Web. ChemSpider, like Wikipedia, is an experiment in crowdsourcing. It provides a platform for knowledgeable people to quickly clean up, cross-reference, add to, comment on, and share existing data. This cleanup relies on volunteers, and also on gamers! The Spectral Game provides a fun (and sneaky) way for students and others to help with the development of ChemSpider.
As librarians, we teach others that authority (the author’s credentials, the reputation of the sponsoring organization, etc.) is one of the most important factors in determining the reliability of a source. What if crowdsourced information really is more correct, complete, and/or useful? How would we know, and how would that knowledge change what we do?
Librarians are not doing enough to make sure data is out there.
There were many eyebrow-raising moments in a talk by Peter Murray-Rust, but this remark really got the librarians’ attention: “Librarians are not doing enough to make sure data is out there.” Instead of really promoting the widest possible access to information, librarians spend lots of time explaining the rules about who can access information, when they can access it, and with which permissions. Murray-Rust used theses and dissertations as a specific example. Scientific journals have a bias for publishing positive results, but sometimes it is equally or more important to know about the experiments that didn’t work. This level of detail is published in theses and dissertations. In print, a thesis or dissertation can usually only be borrowed from the library at the institution where it was written. This is quite limiting. What could librarians do to move more thesis and dissertation content to the open Web?
Peter Murray-Rust is on the faculty at Churchill College at the University of Cambridge and is a leader in the Open Data Movement. He recommends using Open Knowledge buttons like these as a way to let people and machines know your web content is free to use, re-use, and distribute.
Video (starts at 1:15) – http://content.digitalwell.washington.edu/msr/external_release_talks_12_05_2005/18175/player.htm
Open Access is becoming the New Normal.
Scholarly Publishing & Academic Resources Coalition (SPARC) is “an international alliance of academic and research libraries working to correct imbalances in the scholarly publishing system.” SPARC’s specific mission is to promote open access to scholarly journal articles. Heather Joseph, Executive Director of SPARC, spoke to us about the remarkable progress of the Open Access movement in recent years. There are now more than 4,000 journals listed in the Directory of Open Access Journals, up from just a handful when Joseph began speaking on behalf of SPARC in 2005. Faculty at major universities such as Harvard, Stanford, and MIT have agreed to open access policies at the campus level. There is even an open access “swat team” of volunteers who have been through campus-level policy development and are available to coach other colleges and universities through similar processes. Librarians can and are getting involved in the Open Access movement by educating faculty about changes in scholarly communication and working with authors to be sure they know their intellectual property rights. SPARC’s author addendum has proven to be a useful tool for helping authors to avoid signing overly restrictive publication agreements.
SPARC and the Alliance for Taxpayer Access were instrumental in advocating for the NIH Public Access Policy, which mandates that final, peer-reviewed manuscripts of articles resulting from NIH-funded research be deposited into PubMed Central and made available on the Web within twelve months of publication. Joseph is quick to emphasize that it is a public access policy, not an open access policy, due to the 12-month embargo period involved, but she sees it as a major step forward for the Open Access movement and as proof-of-concept that making content publicly available will not do irreparable harm to publishers’ financial models. The Obama Administration has expressed a commitment to government transparency. Joseph is optimistic about a newly introduced bipartisan bill — the Federal Research Public Access Act — which proposes an extension of NIH-like policies to other federal agencies and reduces the embargo period to six months.
Open data allows biologists and clinicians to collaborate on powerful new models to predict the course of disease in individuals.
Stephen Friend is the founder of Sage Bionetworks, a non-profit organization affiliated with Fred Hutchinson Cancer Research Center in Seattle. Sage is developing open data platforms that will faciliate the discovery of better models to predict the course of disease. These models will help clinicians to predict with greater accuracy which individual patients will respond best to which treatments. Open sharing of patient data carries with it more ethical concerns than open sharing of, say, chemical data. Sage is interested in exploring new ways of anonymizing patient data in order to make it more shareable. Instead of working through institutions to gain access to patient data, Sage has sometimes gone directly to patients for permission.
Friend predicted that, as more and more of scientists’ work happens in and through open data networks, citation will happen at the model or theory level, not necessarily at the journal article level.
Story from KPLU radio, aired 2/24/10: http://bit.ly/bjs0yZ
Video (starts at 1:14) – http://content.digitalwell.washington.edu/msr/external_release_talks_12_05_2005/18176/player.htm
Article-level metrics may be the wave of the future.
Peter Binfield from Public Library of Science (PLoS) compared PLoS’s peer-review and publishing practices to more traditional publishing models. He made a compelling argument for how PLoS’s way can be better, faster, more cost-effective, and more beneficial to scientific progress. PLoS One, an open access journal, does some very interesting work with article-level use metrics. For every article they publish, PLoS tracks and makes available information about web usage, when and where the article is cited or bookmarked, expert and community ratings, media and blog coverage, commenting activity, and more. These article-level metrics are proving to be very popular with authors. Will those authors start to demand similar metrics from other journal publishers?
Science is headed back to the garage.
John Wilbanks is the Vice President of Science Commons, a division of Creative Commons that is working to “help people and organizations from every part of the scientific ecosystem lift legal and technical barriers to research and discovery.” Wilbanks quoted one of his mentors, Jonathan Zittrain, who said “Generativity is a system’s capacity to produce unanticipated change through unfiltered contributions from broad and varied audiences.” Science should be practiced in a way that enhances generativity. Placing too many restrictions on access to research data reduces the generativity of science. A truly generative system must have leverage, adaptability, accessibility, ease of mastery, and transferability. It must also be within the law. Science Commons is grappling with a complex set of intellectual property issues to pave the way for datasets to be shared and used.
Science, which started as a hobby, is headed “back to the garage” as those who would not traditionally be considered experts take an interest and get involved. Heather Joseph noted that the PubMed Central archive gets more than 400,000 unique visitors every day. This suggests an audience for research articles that goes far beyond the researchers themselves. Then, too, researchers are looking outside of their own disciplines to connect with insights from other fields of study. Everyone has a role in increasing the generativity of science. What will be the role of the information professional in supporting that generativity?
Video (starts at 1:35) – http://content.digitalwell.washington.edu/msr/external_release_talks_12_05_2005/18177/player.htm
The Fourth Paradigm: Data Intensive Scientific Discovery
a book published by Microsoft Research under a Creative Commons Attribution-ShareAlike license. Fully downloadable for free.
Session notes by Brian Glanz of the Open Science Foundation
Twitter stream – #scspn
Many thanks to conference organizers Lee Dirks of Microsoft, Lisa Green of Science Commons, and especially to Hope Leman of Samaritan Health Services for promoting this symposium within the library community.