1000 Genomes Project Data Available on Amazon Cloud
The world’s largest set of data on human genetic variation — produced by the international 1000 Genomes Project — is now publicly available on the Amazon Web Services (AWS) cloud. Cloud access to the 1000 Genomes Project data through AWS is at http://s3.amazonaws.com/1000genomes/.
Since the project’s launch, the data set has grown enormously: At 200 terabytes — the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs — the current 1000 Genomes Project records are a prime example of big data that has become so massive that few researchers have the computing power to use them. To help solve that problem, AWS has just posted the 1000 Genomes Project data for free as a public data set, providing a centralized repository on the Amazon Simple Storage Service.
The public-private collaboration demonstrates the kind of solutions that may emerge from the Big Data Research and Development Initiative announced today by the White House Office of Science and Technology Policy (OSTP) during an event at the American Association for the Advancement of Science in Washington, D.C.
“The explosion of biomedical data has already significantly advanced our understanding of health and disease. Now we want to find new and better ways to make the most of these data to speed discovery, innovation and improvements in the nation’s health and economy,” said NIH Director Francis S. Collins, M.D., Ph.D. Dr. Collins is among agency leaders speaking in support of the initiative at the launch event.
See the press release for more information.