NLM NCBI Issues RFI to Seek Input on New Sequence Read Archive (SRA) Data Formats
The National Center for Biotechnology Information (NCBI), part of the National Library of Medicine, has issued a Request for Information (RFI) to solicit community feedback on new proposed Sequence Read Archive (SRA) data formats.
The SRA is one of the National Institutes of Health’s (NIH) largest and most diverse datasets and includes a broad collection of experimental DNA and RNA sequences that represent genome diversity across the tree of life. It contains more than 36 petabytes (PB) of data and holds more than nine million records. The SRA continues to experience exponential growth in submission rates and the data is projected to grow to 43 PB by 2023.
NIH, NLM and the NCBI, and the SRA Data Working Group of the NIH Council of Councils have evaluated SRA growth, cost and data usage patterns, and have issued a set of interim recommendations for SRA data storage and retrieval in the cloud.
To understand how best to manage this resource to facilitate its use in research while controlling costs as it grows in size, NIH is requesting input on the use of SRA data. Specifically, NIH would like to better understand how the research community currently uses SRA data, how researchers are using or anticipate using cloud computing with SRA data, and which formats of SRA data are most valuable to the research community.