Skip all navigation and go to page content
NN/LM Home About PSR | Contact PSR | Feedback |Site Map | Help | Bookmark and Share

Archive for the ‘E-Science’ Category

PubChem Turns Ten!

PubChem logoSeptember 16, 2004 is a special day in the history of PubChem. It marks the beginning of PubChem as an on-line resource! Now fast forward ten years. PubChem provides information daily to many tens of thousands of users. Despite the passage of time, PubChem’s primary mission remains the same: providing comprehensive information on the biological activities of chemical substances.

Providing chemical information to researchers in the biomedical science community is a key part of PubChem’s purpose. Over the years, PubChem introduced and incrementally developed several interfaces, each with its own distinct purpose and set of use cases. Primary to these is the Entrez search interface, where PubChem is organized as three distinct databases: Substance, Compound, and BioAssay. Substance provides substance descriptions (accession number: SID), Compound provides the unique small-molecule chemical content of Substance (accession number: CID), and BioAssay provides biological experiment results for substances (accession number: AID). Each of these databases has an advanced search interface and contain numerous indexes and filters, which can be combined to construct elaborate queries. Additional interfaces exist to search and analyze information in PubChem, including the ability to analyze bioactivity information, download chemical and assay data, search by chemical structure or protein sequence, navigate using integrated classifications, visualize chemical 3-D information, and more.

PubChem continues to evolve the way it provides on-line content. External search engines (like Google, Bing, and others) are now a key way in which researchers locate data. In addition, programmatic interfaces now account for a significant portion of PubChem’s overall usage (+50%). Key programmatic interfaces to PubChem include Entrez Utilities and PUG/REST.

NCBI RefSeq Release 67 Available on FTP

The full RefSeq release 67 is now available on the NCBI FTP site with over 61 million records describing 45,166,402 proteins; 8,163,775 RNAs; and sequences from 41,913 different NCBI TaxIDs. More details about the RefSeq release 67 are included in the release statistics and release notes. In addition, reports indicating the accessions included in the release and the files installed are available.

To subscribe to the ncbi-announce mailing list, visit http://www.ncbi.nlm.nih.gov/mailman/listinfo/ncbi-announce.

NIH Seeks Input on Information Resources for Data-Related Standards Widely Used in Biomedical Science

The National Institutes of Health (NIH) has issued a Request for Information (RFI) seeking comments and ideas to inform the creation of an NIH Standards Information Resource (NSIR) that would collect, organize, and make available to the public trusted, systematically organized, and curated information about data-related standards. This resource would focus on those standards that are widely used in biomedical research and related activities. The main purpose of the NSIR would be to help a variety of biomedical users such as researchers, clinicians, data curators, and informaticians, among others, identify and choose data-related standards that are best suited to their needs.

NIH seeks responses to the RFI from biomedical researchers, librarians and information scientists, bioinformaticians, publishers, and other interested individuals. All responses must be submitted electronically to BD2K_NSIR_RFI@mail.nih.gov by September 30, 2014. Please include the Notice number NOT-CA-14-053 in the subject line. Responses to this RFI Notice are voluntary. The submitted information will be reviewed by the NIH staff. Submitted information will be considered confidential.

NIH Issues Finalized Policy on Genomic Data Sharing

Genomic research advances our understanding of factors that influence health and disease, and sharing genomic data provides opportunities to accelerate that research through the power of combining large and information-rich datasets. To promote sharing of human and non-human genomic data and to provide appropriate protections for research involving human data, the National Institutes of Health (NIH) issued the Genomic Data Sharing (GDS) Policy on August 27, 2014. The GDS Policy takes effect for grant applications with due dates on or after January 25, 2015, for contracts submitted on or after January 25, 2015, and for intramural research projects generating genomic data on or after January 25, 2015. NIH has also issued a press release regarding the GDS Policy. A publication describing the use and impact of the NIH database for Genotypes and Phenotypes (dbGaP) data under the Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies, from 2007 through 2013, has been published in Nature Genetics.

Designing Library Data Dashboards with Tableau Software

At this month’s Library Assessment Conference held in Seattle, one panel featured assessment librarians presenting data dashboards they created using Tableau software, Tableau Unleashed: Visualizing Library Data. This presentation includes views of dashboards from University of British Columbia Library (by presenter Jeremy Buhler), UMass Amherst Libraries (by Rachel Lewellen), and Ohio State Libraries (by Sarah Murphy). All of the presenters used Tableau software to produce their dashboards.

Tableau may be the most popular software for creating dashboards right now and the company offers a free version that has a great deal of functionality. In fact, at least one presenter (Sarah Murphy) included dashboards she created using Tableau Public. However, users must be cautioned that any data entered into Tableau Public become public information. That means anyone can see and download your raw data. So, if you use it, be sure all identifying information about individuals is stripped from your files and that you are comfortable with other people downloading your raw data. The presenters also mentioned tips for dashboard design. For additional design guidance, check out the freely downloadable resource A Guide to Creating Dashboards People Love to Use by Juice Analytics.

Upcoming NCBI Discovery Workshops at UC Davis and UC Berkeley!

NCBI Discovery Workshops, consisting of four 2.5-hour hands-on training sessions emphasizing NCBI resources such as BLAST and Nucleotide, will be presented by NCBI staff at the University of California, Davis, on September 15-16, and at the University of California, Berkeley, on September 17-18:

  • Session 1: Navigating NCBI Molecular Data Through the Integrated Entrez System. 9 am – 11:30 am, 9/15 (UC Davis) & 9/17 (UC Berkeley)
  • Session 2: NCBI Genomes, Assemblies and Annotation Products: Microbes to Human. 1 pm – 3:30 pm, 9/15 (UC Davis) & 9/17 (UC Berkeley)
  • Session 3: Advanced NCBI BLAST. 9 am – 11:30 am, 9/16 (UC Davis) & 9/18 (UC Berkeley)
  • Session 4: Gene Expression Resources at NCBI. 1 pm – 3:30 pm, 9/16 (UC Davis) & 9/18 (UC Berkeley)

​For more information or to register:

For Questions:

NLM’s National Center for Biotechnology Information Receives HHSinnovates Award!

A collaborative project between the National Library of Medicine’s National Center for Biotechnology Information (NCBI) and several other federal and state partners, to reduce the time and improve the accuracy of detecting foodborne pathogens by using whole genome sequencing (WGS) techniques, received the HHSinnovates award on July 21, 2014. The HHSinnovates program was initiated in 2010 to recognize new ideas and solutions developed by HHS employees and their collaborators. Six finalist teams were recognized at the awards ceremony. The WGS Food Safety Project, which also involved the Centers for Disease Control and Prevention (CDC), the Food and Drug Administration (FDA), the US Department of Agriculture (USDA), and state public health laboratories, was one of three projects to be honored as “Secretary’s Picks” by HHS Secretary Sylvia Mathews Burwell. The award went to the specific individuals leading the project in the various agencies; in the case of NCBI, Senior Scientist William Klimke, PhD, was honored for his work in heading NCBI’s part of the project.

WGS provides greater specificity than other techniques, such as the commonly used pulsed-field gel electrophoresis (PFGE), in identifying the DNA fingerprint of bacteria. It also can more rapidly determine whether isolates are related to a foodborne disease outbreak. The demonstration project involves real-time sequencing of Listeria monocytogenes isolates from human DNA as well as the food supply chain. In the project, the whole genomes of isolates are sequenced and the sequencing data are sent to NCBI, which performs assembly, annotation and analysis, and then sends results back to CDC, FDA, USDA and the labs. Collaborative projects using WGS for other pathogens related to food safety are also underway.

NCATS Announces the Toxicology in the 21st Century (Tox21) Data Challenge 2014 Competition

The National Center for Advancing Translational Sciences (NCATS) has announced the Toxicology in the 21st Century (Tox21) Data Challenge 2014 competition.
The goal of the challenge is to crowdsource data analysis by independent researchers in order to develop computational models that can better predict chemical toxicity. It is designed to improve current toxicity assessment methods, which are often slow and costly. The model submission deadline is November 14, 2014. NCATS will showcase the winning models in January 2015. Registration for the challenge and more information is available on the web site.

Tox21 scientists are currently testing a library of more than 10,000 chemical compounds in NCATS’s high-throughput robotic screening system. To date, the team has produced nearly 50 million data points from screening the chemical library against cell-based assays. Data generated from twelve of these assays form the basis of the 2014 challenge. For more information on the Tox21 Modeling Challenge and Tox21 Program, contact Anna Rossoshek.

NLM to Host National Digital Stewardship Residency Symposium on April 8, 2014

On April 8, 2014, the inaugural cohort of National Digital Stewardship Residents will present a symposium entitled Emerging Trends in Digital Stewardship at the National Library of Medicine (NLM), on the NIH campus in Bethesda, MD. The symposium will consist of panel presentations on topics including preserving social media and collaborative workspaces, open government and open data, and digital strategies for public and non-profit institutions. It will also feature a demonstration of BitCurator, an environment of digital forensics tools designed to help collecting institutions manage born-digital materials, developed by the School of Information and Library Science at the University of North Carolina, Chapel Hill (SILS), and the Maryland Institute for Technology in the Humanities (MITH). The symposium is free and open to the public. Pre-registration is encouraged.

The National Digital Stewardship Residency (NDSR) is an initiative of the Library of Congress (LC) and Institute of Museum and Library Services (IMLS), to “provide a robust, hands-on learning experience to complement graduate-level training and education.”  The inaugural cohort began their residency at Washington, DC area libraries, museums, and cultural institutions in September 2013. Ten residents are embedded in institutions around the area, each completing a project related to an aspect of digital preservation and stewardship. The NDSR program aims to “serve the American people by developing the next generation of stewards to collect, manage, preserve, and make accessible our digital assets.” NLM serves as a host institution for the National Digital Stewardship Residency, and since September has worked with its NDSR Resident Maureen Harlow to develop a thematic Web archive collection. This project builds on a pilot Web archive collection completed by NLM and featured in The Signal blog of the Library of Congress, in October 2012.

Newly Released Entrez Direct from NCBI!

NCBI has just released Entrez Direct, a new software suite that allows you to use the UNIX command line to directly access NCBI’s data servers, as well as parse and format data to create customized data files. The latest NCBI News story discusses Entrez Direct and gives several examples of how the programs may be used, as well as links to the suite on FTP and documentation. Entrez Direct is available as a simple FTP download and has extensive documentation on the NCBI web site.