Archive for the ‘E-Science’ Category
November 2013 marks 25 years that the National Center for Biotechnology Information (NCBI) has been providing access to biomedical and genomic information to advance science and health. Established in 1988 as a division of the National Library of Medicine (NLM) at the National Institutes of Health (NIH), NCBI has grown into a leading source for public biomedical databases, software tools for analyzing molecular and genomic data, and research in computational biology. NCBI’s resources rank among the most heavily used government web sites in the United States, with approximately 3 million users every day.
In recognition of NCBI’s achievements, an awards and recognition program was held November 1 on the NIH campus in Bethesda, Maryland. At that event Tony Hey, PhD, Vice President of Microsoft Research, presented NCBI Director David Lipman, MD, with the Jim Gray eScience Award. Named for Jim Gray, a technical fellow for Microsoft Research and an A.M. Turing Award winner who disappeared at sea in 2007, the annual award recognizes researchers who have made outstanding contributions to the field of data-intensive computing and made “science easier for scientists,” according to Microsoft.
Gray was very familiar with the work of NCBI. He was a member of the NLM Board of Regents in 2006 and met a number of times with Dr. Lipman, NCBI Information Engineering Branch Chief Jim Ostell, PhD, and other staff to discuss issues such as organization of and access to biomedical literature and data. His interest in NCBI’s work is evidenced by his final lecture, in January 2007, in which he highlighted the importance of NCBI/NLM biomedical literature databases like PubMed and PubMed Central, genomic databases such as GenBank, and NCBI’s Entrez system for searching across these and many other databases. An edited version of Gray’s lecture can be read in The Fourth Paradigm, available on Microsoft Research’s web site.
The NCBI awards program also featured presentations by Sir Richard Roberts, PhD, chief scientific officer of New England Biolabs, who provided the keynote address, entitled “A personal recollection of GenBank and NCBI.” NLM Director Donald A.B. Lindberg, MD, recounted the planning process that led to the formation of NCBI, and NIH Deputy Director for Intramural Research Michael M. Gottesman, MD, provided introductory remarks for the awards ceremony. Dr. Lipman closed the event by recognizing the dedicated and hard-working staff of NCBI who have enabled the progress that has transpired over the last 25 years.
The New England Collaborative Data Management Curriculum (NECDMC) offers openly available materials that librarians can use to teach research data management (RDM) best practices to students in the sciences, health sciences, and engineering fields, at the undergraduate and graduate levels. The materials in the curriculum are openly available, with lecture notes and slide presentations that librarians teaching RDM can customize for their particular audiences. The curriculum also has a database of real life research cases that can be integrated into the curriculum to address discipline specific data management topics. The project has been led by the Lamar Soutter Library at the University of Massachusetts Medical School with funding from the National Network of Libraries of Medicine, New England Region.
The Lamar Soutter Library developed the Frameworks for a Data Management Curriculum with Worcester Polytechnic Institute in 2011. Over the past year the Soutter Library has partnered with librarians from Tufts University, University of Massachusetts Amherst, Northeastern University, and the Marine Biological Laboratory and Woods Hole Oceanographic Institute, to fully develop the curriculum’s lecture content, readings, activities, and slide presentations.
Some libraries will be piloting the curriculum at their institutions and conducting evaluations with students of the learning modules. If you are teaching or plan to teach RDM, you are invited to pilot the NECDMC. For more information about being a pilot partner, please contact Donna Kafel.
The 9th International Digital Curation Conference (IDCC) will be held from February 24-27, 2014, at the Omni San Francisco Hotel, and registration is now open. This year the IDCC will focus on how data-driven developments are changing the world around us, recognizing that the growing volume and complexity of data provides institutions, researchers, businesses, and communities with a range of exciting opportunities and challenges. The Conference will explore the expanding portfolio of tools and data services, as well as the diverse skills that are essential to explore, manage, use, and benefit from valuable data assets. The program will reflect cultural, technical, and economic perspectives, and will illustrate the progress made in this arena in recent months.
IDCC14 will be organized by the Digital Curation Centre UK, in partnership with the University of California Curation Center (UC3) at the California Digital Library and the Coalition for Networked Information (CNI). The draft program is now available.
The National Academy of Sciences Board on Research Data and Information (BRDI) is holding an open challenge to increase awareness of current issues and opportunities in research data and information. These issues include, but are not limited to, accessibility, integration, discoverability, reuse, sustainability, perceived versus real value, and reproducibility. Opportunities include, but are not limited to, analyzing such data and information in new ways to achieve significant societal benefit. Entrants are expected to describe one or more of the following:
- Novel ideas
There is no restriction on the type of data or information, or the type of innovation that can be described. All data and tools that form the basis of a contestant’s entry must be made freely and openly available. The challenge is held in memory of Lee Dirks, a pioneer in scholarly communication. Anticipated outcomes of the challenge include the potential for original and innovative solutions to societal problems using existing research data and information, national recognition for the successful contestants, and possibly their institutions.
Contestants must be citizens or permanent residents of the United States. A one-page Letter of Intent, including the project title, project outline, names, affiliations, emails and telephone numbers of contestants, is encouraged but not required. This letter of intent will not be used to evaluate submissions, but will be used to determine the expertise needed for judging of submissions. The deadline for the submission of this Letter of Intent is December 1, 2013, and should be sent by email to Cheryl Levey. Final entries must be submitted to Ms. Levey by May 15, 2014, and expand the project outline to no more than 3000 words. Submissions will be judged by BRDI members and other relevant experts based upon:
- Originality and creativity
- Potential benefits to society
The First Place and Second Place winners will be announced on the Board on Research Data and Information website in early July, 2014. Awardees will be invited to present their projects at the National Academy of Sciences in Washington, D.C., as part of a symposium of the regularly scheduled Board of Research Data and Information meeting in the latter half of 2014.
Registration is now available for the full-day workshop, Teaching Research Data Management with the New England Collaborative Data Management Curriculum, that will be held on Friday, November 8, at the Beechwood Hotel, 367 Plantation St., Worcester, MA. This is a “train the trainer” class, intended for librarians who will be teaching best practices in research data management to science, health science, and/or engineering students and faculty. During the workshop, Elaine Martin, Andrew Creamer, and Donna Kafel will be demonstrating the components of the New England Collaborative Data Management Curriculum and discussing ways that the curriculum materials can be used and customized.
Registrants for the workshop must attend a prerequisite webinar, Best Practices for Teaching Research Data Management and Consulting on Data Management Plans in New England, that will be held on Thursday, October 31, from 9-10 AM PDT. The webinar will be archived so that anyone unable to attend the live session may view it prior to the November 8 class. The number of attendees for the in-person workshop will be limited to 40. Registration for the workshop is on a first-come, first-serve basis. The fee for the workshop is $35 (no refunds will be issued). The webinar is free, but registration is required to attend the live session on 10/31.
The National Library of Medicine (NLM) has announced its next initiative as part of its ongoing partnership with the National Endowment for the Humanities (NEH). Working with NEH’s Office of Digital Humanities, the National Science Foundation, and Virginia Polytechnic Institute and State University (Virginia Tech), the NLM will be a part of An Epidemiology of Information: New Methods for Interpreting Disease and Data, an interdisciplinary symposium exploring new methods for large-scale data analysis of epidemic disease.
Scheduled to take place at the Virginia Tech Research Center in Arlington, VA, on October 17, 2013, from 8:30 AM to 5:00 PM, “An Epidemiology of Information” will be a unique public forum through which policy makers, public health experts, and scholars can address pressing questions about how new methods of analyzing large-scale datasets can inform research and policy approaches to epidemic disease. Panelists will consider what these new methods suggest for contemporary infodemiology and epidemic intelligence, as well as the implications of data mining as a disease surveillance mechanism, and how new forms of reporting and public health surveillance affect public health policy. The symposium will also explore how these new methods can inform research on the 1918 influenza pandemic, and help to answer lingering questions about the spread of the disease, its pathogenicity, the unusual mortality rates, or the effectiveness of public health responses.
Featured speakers will include Dr. Jeffery Taubenberger, Chief, Viral Pathogenesis and Evolution Section, National Institute of Allergy and Infectious Diseases (NIAID), and Dr. David Morens, Senior Advisor to the Director, NIAID, whose research in data analysis and historical epidemiology has influenced the approaches being adopted and adapted by digital humanities scholars working in the history of medicine. “An Epidemiology of Information” is made possible in part from support received by Virginia Tech through the international Digging into Data Challenge competition sponsored by NEH. Funding for Virginia Tech’s Canadian partner, the Center for E-Health Initiatives of the University of Toronto, comes from the Social Science and Humanities Research Council of Canada. The symposium is free and open to the public, but registration is required.
The National Library of Medicine (NLM) will join with other health data leaders and innovators for the fourth annual Health Datapalooza. The unique event will be held June 3-4, 2013, at the Omni Shoreham Hotel in Washington, DC. Health Datapalooza IV highlights new, innovative, and effective ways health data is being used by companies, startups, academics, government agencies, and individuals. More than 1,500 people are expected to attend. The event is organized by a consortium of private sector, non-profit and government agencies, including the Department of Health and Human Services (HHS). NLM has participated in the event every year.
As the world’s largest medical library, NLM has made its electronic data freely available for decades, so that others can use it to develop new products and services. Additionally, NLM provides application programming interfaces (APIs) so that external products and services, such as electronic health records, can easily access its data. NLM experts will be in the Health Datapalooza exhibit hall (Booth 12), to explain how developers can utilize the variety of available NLM data, including medical literature; consumer health information; clinical trials; medical terminology; and drugs. NLM will also participate in the “Datalab” breakout session, featuring federal government data experts.
The National Library of Medicine has announced that Extensible Markup Language (XML) data from the IndexCat™ database is now available for free download. Released with a Document Type Definition (DTD) that allows researchers to validate the data, this new XML release includes the digitized content of more than 3.7 million bibliographic items from the printed, 61-volume Index-Catalogue of the Library of the Surgeon-General’s Office, originally published from 1880 to 1961. The XML describes items spanning five centuries, including millions of journal and newspaper articles, obituaries, and letters; hundreds of thousands of monographs and dissertations; and thousands of portraits. Together, these items cover a wide range of subjects such as the basic sciences, scientific research, civilian and military medicine, public health, and hospital administration.
The NLM release of the Index-Catalogue in XML format opens this key resource in the history of medicine and science to new uses and users. It is one of the monuments of the Library’s longstanding, systematic indexing of the medical literature, an effort which William Henry Welch (1850-1934), the great pathologist and bibliophile, considered to be “America’s greatest contribution to medical knowledge.” This indexing, begun by John Shaw Billings in the nineteenth century at the Library of the Surgeon-General’s Office, United States Army (known today as the NLM), eventually created two distinct products: the Index-Catalogue of the Library of the Surgeon-General’s Office, United States Army, and the Index Medicus, forerunner of MEDLINE®, and now the largest component of PubMed.®
Released alongside the IndexCatalogue XML are an integrated XML file and associated DTD for two collections developed from the electronic database of A Catalogue of Incipits of Mediaeval Scientific Writings in Latin (rev.), by Lynn Thorndike and Pearl Kibre (eTK), and the updated and expanded version of Scientific and Medical Writings in Old and Middle English: An Electronic Reference (eVK2), edited by Linda Ehrsam Voigts and Patricia Deery Kurtz. Also available via the online IndexCat, these resources encompass over 42,000 records of incipits, or the beginning words of a medieval manuscript or early printed book, covering various medical and scientific writings on topics as diverse as astronomy, astrology, geometry, agriculture, household skills, book production, occult science, natural science, and mathematics, as these disciplines and others were largely intermingled in the medieval period of European history. The NLM release of these resources in XML format joins many other freely downloadable resources, including the XML for MEDLINE®/PubMed® data, which includes over 22 million references to biomedical and life sciences journal articles back to 1946, and, for some journals, much earlier.
The release also coincides with the NLM’s participation in “Shared Horizons: Data, Biomedicine, and the Digital Humanities,” an interdisciplinary symposium exploring the intersection of digital humanities and biomedicine, being held April 10-12, 2013, in partnership with the National Endowment for the Humanities’ Office of Digital Humanities, Maryland Institute for Technology in the Humanities at the University of Maryland, and Research Councils UK. Shared Horizons will create opportunities for disciplinary cross-fertilization through a mix of formal and informal presentations, combined with breakout sessions designed to promote a rich exchange of ideas about how large-scale quantitative methods can lead to new understandings of human culture. Bringing together researchers from the digital humanities and bioinformatics communities, the symposium will explore ways in which these two communities might fruitfully collaborate on projects that bridge the humanities and medicine around the topics of sequence alignment and network analysis, two modes of analysis that intersect with “big data.” All Shared Horizons sessions will be live-streamed with a monitored back channel for the public to post/tweet comments. Recordings of all talks will also be posted to the Shared Horizons website, with the ability to comment pre- and post-event.
Data dashboards provide a mechanism to use visualization, rather than words, to get a quick overview of progress made towards programmatic goals, and to engage stakeholders in the evaluation process. To use data dashboards effectively, it is important to define the user group(s) involved and to select recognizable metrics from trusted sources. There are a variety of resources available to assist with producing dashboards for web sites, blogs, etc., including Juice Analytics, Tableau Software, and Google Analytics. After registering with Juice Analytics, one resource to consider is a white paper listed in the “Visualization Resources” category, called A Guide to Creating Dashboards People Love to Use. Once established, data dashboards can monitor the progress of a program, communicate progress to stakeholders, and provide early signs of problems that may be arising.
To get an idea of a final product, a good example to view is the Health IT Dashboard showing the implementation of the Regional Extension Center (REC) Cooperative Agreement Program, coordinated by the federal Office of the National Coordinator for Health IT (ONC). The REC program is funded to provide technical assistance for EHR implementation to 100,000 primary care providers, through 62 nationwide sites. The dashboard charts the enrollment of primary care providers in this program, and monitors their efforts to become meaningful users of electronic health records (EHRs). Dashboards could be a colorful, visual way for you to show what you do to benefit the overall institution!
The ENCODE Project was planned as a follow-up to the Human Genome Project. The Human Genome Project sequenced the DNA that makes up the human genome; the ENCODE Project seeks to interpret this sequence. Coinciding with the completion of the Human Genome Project in 2003, the National Human Genome Research Institute (NHGRI) organized the launching of the ENCODE Project, as a worldwide effort involving more than 30 research groups and 400 scientists. The approximately 20,000 genes that provide instructions for making proteins account for only about 1% of the human genome. Researchers embarked on the ENCODE Project to figure out the purpose of the remaining 99% of the genome. Scientists discovered that more than 80 percent of this non-gene component of the genome, which was once considered “junk DNA,” actually has a role in regulating the activity of particular genes (gene expression).
Researchers think that changes in the regulation of gene activity may disrupt protein production and cell processes and result in disease. A goal of the ENCODE Project is to link variations in the expression of certain genes to the development of disease. The ENCODE Project has given researchers insight into how the human genome functions. As researchers learn more about the regulation of gene activity and how genes are expressed, the scientific community will be able to better understand how the entire genome can affect human health.
NHGRI recently announced updated results of the ENCODE project in a press release. Further detailed information about the findings are available from the ENCODE project portal. Published research findings are also available through the new web site, Nature Encode Explorer, which provides public access to scientific information collected from the ENCODE Project.