Love Your Data Week, Day 5: Rescuing Unloved Data
How do data become unloved? We data users don’t love data that are messy, poorly documented, incomplete, or unwieldy, to name just a few frustrations. However, one important way that data become unloved is that they are just plain old. Older data tend not to be machine-readable, which can pretty much be the kiss of death. Digitization, while it’s improving, is still somewhat labor-intensive and costly, and so unless a data set is obviously worth the trouble, it may languish.
However, researchers are starting to explore whether there may be some hidden gems worth rescuing. One area in which this is happening is climate data, and a great example is the Glacier Photograph Collection from the National Snow and Ice Data Center (NSIDC). Before this collection was digitized, users had to travel to the NSIDC in Colorado, ask staff to find physical images or microfilm for them in the collection, and then deal with those physical artefacts. Not surprisingly, the collection had few users. However, digitizing these photographs (which can be considered data sources, as they contain information that can be analyzed) has made them not only accessible, but an important resource for documenting changes in glacier size and coverage. Digitizing some of the old photographs also suggests locations for repeat photographs from the same vantage point, which can indicate changes across time periods.
PHOTO: Left: William O. Field, 1941; Right: Bruce F. Molnia, 2004. Muir Glacier: From the Glacier Photograph Collection. Boulder, Colorado USA: National Snow and Ice Data Center. Digital media.
But, using the above example is cheating a little bit; these photographs were unloved because they were undigitized, but it was clear that they were worth digitizing. In fact, it was so clear that NSIDC was able to get funding and enter into partnerships to get that work done. So, what if a researcher has a great idea, but needs sheer person-power to bring it to fruition? These days, crowd-sourcing may do the trick! Check out the Swiss project Data Rescue @ Home, in which citizen-volunteers are entering German climate data collected during WWII, and also have completed entering data from a weather station in the Solomon Islands collected in the early to mid-1900s. By January 2014, they reported having digitized 1.3 million values! They note: “The old data are expected to be very useful for different international research and reanalysis projects…[for example,] historical weather data from the Azores Islands are particularly valuable since the islands are located at the southern node of the most important climatic variability mode in the North Atlantic-European region, the so-called North Atlantic Oscillation (NAO), and there are not much other historical data available from the larger region.”
PHOTO: Example of data collected in the Solomon Islands, entered electronically by citizen-volunteers of the Data Rescue @ Home project (Accessed 2-13-17).
Interested in getting involved in a citizen-science project yourself? Here’s a list of possibilities! And, if you really get hooked, you may want to dive into some collections of older non-digitized data and consider starting your own project, to rescue the unloved data and give them new life.
OK, I’m off now to figure out how to get on the project where I can hang out on the beach in New Jersey and count horseshoe crabs!