NCBI has just released Entrez Direct, a new software suite that allows you to use the UNIX command line to directly access NCBI’s data servers, as well as parse and format data to create customized data files. The latest NCBI News story discusses Entrez Direct and gives several examples of how the programs may be used, as well as links to the suite on FTP and documentation. Entrez Direct is available as a simple FTP download and has extensive documentation on the NCBI web site.
Archive for the ‘E-Science’ Category
The NCBI, in partnership with the National Library of Medicine Training Center (NTC), will offer the Librarian’s Guide to NCBI course on the NIH campus in April 2014. This will be the second presentation of the course; it was previously offered in the spring of 2013. After the course, lecture slides and hands-on practical exercises will be posted on the education area of the NCBI FTP site and video tutorials of the course lectures will be available on the NCBI YouTube channel. Materials from the 2013 course are currently available.
A Librarian’s Guide is an intense five-day exploration of modern molecular biology, genetic, and other biomedical data as represented at the NCBI. The course explains how and why these data are generated, their importance in modern biomedical research, and how to access them through the NCBI Web site. It is intended for medical librarians in the United States who currently are offering bioinformatics education and support services to their patrons or are planning to offer such services in the future. More information is available in the newest NCBI Insights blog post.
All applicants for A Librarian’s Guide must have successfully completed the asynchronous online Fundamentals of Bioinformatics and Searching class, which is a six-week introduction to molecular biology and bioinformatics taught by Diane Rein, Ph.D., MLS, and offered through the NTC. The Fundamentals course is open to any medical librarian in the United States interested in an introduction to bioinformatics and NCBI resources. A winter 2014 Fundamentals class, which runs from February 10 – March 21, 2014, is open for applications. Only people who have successfully completed the Fundamentals class may apply to A Librarian’s Guide to NCBI. The application process for eligible Fundamentals candidates will be announced in February 2014.
The NIH Big Data to Knowledge (BD2K) initiative has released a Funding Opportunity Announcement (FOA) to support a U24 resource award for Development of an NIH BD2K Data Discovery Index Coordination Consortium. The purpose of this FOA is to create a consortium to begin development of an NIH Data Discovery Index (DDI) to allow discovery, access, and citation of biomedical data. Letters of intent to apply are due by February 6, 2014, and completed applications are due by March 6, 2014. Budgets are limited to $2,000,000 in direct costs per year but must reflect the actual needs of the proposed project. The maximum project period is three years.
As part of the NIH Big Data to Knowledge (BD2K) initiative, the DDI seeks to fulfill the recommendation from the Data and Informatics Working Group (DIWG) Report to the Advisory Council to the Director to “Promote Data Sharing Through Central and Federated Catalogues.” The awardee in response to this FOA will constitute a DDI Coordination Consortium (DDICC, U24) to conduct outreach, fund small pilot projects, manage communication with stakeholders, constitute and coordinate Task Forces to study relevant questions related to access, discoverability, citation for all biomedical data and assure community engagement in the development, testing, and validation of an NIH DDI. Part of this effort will be to assemble a user interface (website) through which the results of development and testing of models for an NIH DDI may be communicated.
November 2013 marks 25 years that the National Center for Biotechnology Information (NCBI) has been providing access to biomedical and genomic information to advance science and health. Established in 1988 as a division of the National Library of Medicine (NLM) at the National Institutes of Health (NIH), NCBI has grown into a leading source for public biomedical databases, software tools for analyzing molecular and genomic data, and research in computational biology. NCBI’s resources rank among the most heavily used government web sites in the United States, with approximately 3 million users every day.
In recognition of NCBI’s achievements, an awards and recognition program was held November 1 on the NIH campus in Bethesda, Maryland. At that event Tony Hey, PhD, Vice President of Microsoft Research, presented NCBI Director David Lipman, MD, with the Jim Gray eScience Award. Named for Jim Gray, a technical fellow for Microsoft Research and an A.M. Turing Award winner who disappeared at sea in 2007, the annual award recognizes researchers who have made outstanding contributions to the field of data-intensive computing and made “science easier for scientists,” according to Microsoft.
Gray was very familiar with the work of NCBI. He was a member of the NLM Board of Regents in 2006 and met a number of times with Dr. Lipman, NCBI Information Engineering Branch Chief Jim Ostell, PhD, and other staff to discuss issues such as organization of and access to biomedical literature and data. His interest in NCBI’s work is evidenced by his final lecture, in January 2007, in which he highlighted the importance of NCBI/NLM biomedical literature databases like PubMed and PubMed Central, genomic databases such as GenBank, and NCBI’s Entrez system for searching across these and many other databases. An edited version of Gray’s lecture can be read in The Fourth Paradigm, available on Microsoft Research’s web site.
The NCBI awards program also featured presentations by Sir Richard Roberts, PhD, chief scientific officer of New England Biolabs, who provided the keynote address, entitled “A personal recollection of GenBank and NCBI.” NLM Director Donald A.B. Lindberg, MD, recounted the planning process that led to the formation of NCBI, and NIH Deputy Director for Intramural Research Michael M. Gottesman, MD, provided introductory remarks for the awards ceremony. Dr. Lipman closed the event by recognizing the dedicated and hard-working staff of NCBI who have enabled the progress that has transpired over the last 25 years.
The New England Collaborative Data Management Curriculum (NECDMC) offers openly available materials that librarians can use to teach research data management (RDM) best practices to students in the sciences, health sciences, and engineering fields, at the undergraduate and graduate levels. The materials in the curriculum are openly available, with lecture notes and slide presentations that librarians teaching RDM can customize for their particular audiences. The curriculum also has a database of real life research cases that can be integrated into the curriculum to address discipline specific data management topics. The project has been led by the Lamar Soutter Library at the University of Massachusetts Medical School with funding from the National Network of Libraries of Medicine, New England Region.
The Lamar Soutter Library developed the Frameworks for a Data Management Curriculum with Worcester Polytechnic Institute in 2011. Over the past year the Soutter Library has partnered with librarians from Tufts University, University of Massachusetts Amherst, Northeastern University, and the Marine Biological Laboratory and Woods Hole Oceanographic Institute, to fully develop the curriculum’s lecture content, readings, activities, and slide presentations.
Some libraries will be piloting the curriculum at their institutions and conducting evaluations with students of the learning modules. If you are teaching or plan to teach RDM, you are invited to pilot the NECDMC. For more information about being a pilot partner, please contact Donna Kafel.
Registration Now Available for the 9th International Data Curation Conference February 24-27, 2014, in San Francisco
The 9th International Digital Curation Conference (IDCC) will be held from February 24-27, 2014, at the Omni San Francisco Hotel, and registration is now open. This year the IDCC will focus on how data-driven developments are changing the world around us, recognizing that the growing volume and complexity of data provides institutions, researchers, businesses, and communities with a range of exciting opportunities and challenges. The Conference will explore the expanding portfolio of tools and data services, as well as the diverse skills that are essential to explore, manage, use, and benefit from valuable data assets. The program will reflect cultural, technical, and economic perspectives, and will illustrate the progress made in this arena in recent months.
IDCC14 will be organized by the Digital Curation Centre UK, in partnership with the University of California Curation Center (UC3) at the California Digital Library and the Coalition for Networked Information (CNI). The draft program is now available.
The National Academy of Sciences Board on Research Data and Information (BRDI) is holding an open challenge to increase awareness of current issues and opportunities in research data and information. These issues include, but are not limited to, accessibility, integration, discoverability, reuse, sustainability, perceived versus real value, and reproducibility. Opportunities include, but are not limited to, analyzing such data and information in new ways to achieve significant societal benefit. Entrants are expected to describe one or more of the following:
- Novel ideas
There is no restriction on the type of data or information, or the type of innovation that can be described. All data and tools that form the basis of a contestant’s entry must be made freely and openly available. The challenge is held in memory of Lee Dirks, a pioneer in scholarly communication. Anticipated outcomes of the challenge include the potential for original and innovative solutions to societal problems using existing research data and information, national recognition for the successful contestants, and possibly their institutions.
Contestants must be citizens or permanent residents of the United States. A one-page Letter of Intent, including the project title, project outline, names, affiliations, emails and telephone numbers of contestants, is encouraged but not required. This letter of intent will not be used to evaluate submissions, but will be used to determine the expertise needed for judging of submissions. The deadline for the submission of this Letter of Intent is December 1, 2013, and should be sent by email to Cheryl Levey. Final entries must be submitted to Ms. Levey by May 15, 2014, and expand the project outline to no more than 3000 words. Submissions will be judged by BRDI members and other relevant experts based upon:
- Originality and creativity
- Potential benefits to society
The First Place and Second Place winners will be announced on the Board on Research Data and Information website in early July, 2014. Awardees will be invited to present their projects at the National Academy of Sciences in Washington, D.C., as part of a symposium of the regularly scheduled Board of Research Data and Information meeting in the latter half of 2014.
Registration Now Open for “Teaching Research Data Management with the New England Collaborative Data Management Curriculum”
Registration is now available for the full-day workshop, Teaching Research Data Management with the New England Collaborative Data Management Curriculum, that will be held on Friday, November 8, at the Beechwood Hotel, 367 Plantation St., Worcester, MA. This is a “train the trainer” class, intended for librarians who will be teaching best practices in research data management to science, health science, and/or engineering students and faculty. During the workshop, Elaine Martin, Andrew Creamer, and Donna Kafel will be demonstrating the components of the New England Collaborative Data Management Curriculum and discussing ways that the curriculum materials can be used and customized.
Registrants for the workshop must attend a prerequisite webinar, Best Practices for Teaching Research Data Management and Consulting on Data Management Plans in New England, that will be held on Thursday, October 31, from 9-10 AM PDT. The webinar will be archived so that anyone unable to attend the live session may view it prior to the November 8 class. The number of attendees for the in-person workshop will be limited to 40. Registration for the workshop is on a first-come, first-serve basis. The fee for the workshop is $35 (no refunds will be issued). The webinar is free, but registration is required to attend the live session on 10/31.
NLM to Participate with Partners in “An Epidemiology of Information: New Methods for Interpreting Disease and Data”
The National Library of Medicine (NLM) has announced its next initiative as part of its ongoing partnership with the National Endowment for the Humanities (NEH). Working with NEH’s Office of Digital Humanities, the National Science Foundation, and Virginia Polytechnic Institute and State University (Virginia Tech), the NLM will be a part of An Epidemiology of Information: New Methods for Interpreting Disease and Data, an interdisciplinary symposium exploring new methods for large-scale data analysis of epidemic disease.
Scheduled to take place at the Virginia Tech Research Center in Arlington, VA, on October 17, 2013, from 8:30 AM to 5:00 PM, “An Epidemiology of Information” will be a unique public forum through which policy makers, public health experts, and scholars can address pressing questions about how new methods of analyzing large-scale datasets can inform research and policy approaches to epidemic disease. Panelists will consider what these new methods suggest for contemporary infodemiology and epidemic intelligence, as well as the implications of data mining as a disease surveillance mechanism, and how new forms of reporting and public health surveillance affect public health policy. The symposium will also explore how these new methods can inform research on the 1918 influenza pandemic, and help to answer lingering questions about the spread of the disease, its pathogenicity, the unusual mortality rates, or the effectiveness of public health responses.
Featured speakers will include Dr. Jeffery Taubenberger, Chief, Viral Pathogenesis and Evolution Section, National Institute of Allergy and Infectious Diseases (NIAID), and Dr. David Morens, Senior Advisor to the Director, NIAID, whose research in data analysis and historical epidemiology has influenced the approaches being adopted and adapted by digital humanities scholars working in the history of medicine. “An Epidemiology of Information” is made possible in part from support received by Virginia Tech through the international Digging into Data Challenge competition sponsored by NEH. Funding for Virginia Tech’s Canadian partner, the Center for E-Health Initiatives of the University of Toronto, comes from the Social Science and Humanities Research Council of Canada. The symposium is free and open to the public, but registration is required.
The National Library of Medicine (NLM) will join with other health data leaders and innovators for the fourth annual Health Datapalooza. The unique event will be held June 3-4, 2013, at the Omni Shoreham Hotel in Washington, DC. Health Datapalooza IV highlights new, innovative, and effective ways health data is being used by companies, startups, academics, government agencies, and individuals. More than 1,500 people are expected to attend. The event is organized by a consortium of private sector, non-profit and government agencies, including the Department of Health and Human Services (HHS). NLM has participated in the event every year.
As the world’s largest medical library, NLM has made its electronic data freely available for decades, so that others can use it to develop new products and services. Additionally, NLM provides application programming interfaces (APIs) so that external products and services, such as electronic health records, can easily access its data. NLM experts will be in the Health Datapalooza exhibit hall (Booth 12), to explain how developers can utilize the variety of available NLM data, including medical literature; consumer health information; clinical trials; medical terminology; and drugs. NLM will also participate in the “Datalab” breakout session, featuring federal government data experts.