National Network of Libraries of Medicine
English Arabic Chinese (Simplified) French Hindi Japanese Korean Persian Portuguese Russian Spanish

Data Science

New On-Demand Course Available – Cool Creative Communications: Dazzling Data Visualization

SEA Data Science - Wed, 2020-04-01 14:27

Earn MLA CEs while you practice your data visualization skills!

Data Visualization enables us to quickly glean insights and patterns from data and communicate its key aspects intuitively, persuasively, and memorably. In this session, participants will discuss the fundamental principles of effective visual data communication as they critique and evaluate existing visualizations. They will also locate sources for downloadable data, and develop simple interactive visualizations using Tableau Public, a free and popular data visualization tool.

This class is intended as a quick-start guide to creating effective data visualizations and is geared toward a general audience with no prior experience creating visualizations.

This is an online, asynchronous class, offered via Moodle, the NNLM’s learning management system.

To learn more or to register:

Questions? Contact Kiri Burcat

Categories: Data Science

NLM Responds to the Coronavirus Pandemic

PSR Data Science - Tue, 2020-03-24 19:13

The National Library of Medicine is working on multiple fronts to improve researchers’ understanding of SARS-CoV-2 (the virus that causes the novel coronavirus) and aid in the response to COVID-19 (the disease caused by the novel coronavirus). By enhancing access to relevant data and information, NLM is demonstrating how libraries can contribute in real time to research and response efforts during this crisis.

NLM is using PubMed Central®, its digital archive of peer-reviewed biomedical and life sciences journal literature, to expand access to full-text articles related to coronavirus. These activities build on recent requests from the White House Office of Science and Technology Policy (OSTP) and science policy leaders of other nations calling on the global publishing community to make all COVID-19-related research publications and data immediately available to the public in forms that support automated text-mining.

NLM has stepped up its collaboration with publishers and scholarly societies to increase the number of coronavirus-related journal articles in PMC, along with the available data supporting them. NLM is adapting its standard procedures for depositing articles into PMC to make it easier and faster to submit articles in machine-readable formats. NLM is also engaging with journals and publishers that do not participate in PMC but whose publications are within the scope of the Library’s collection. A growing number of publishers and societies are taking advantage of these flexibilities. Submitted publications are being made available as quickly as possible after publication for discovery in PMC and through the PMC Text Mining Collections for machine analysis, secondary analysis, and other types of reuse.

This enhanced collection of text-minable content enables AI and machine-learning researchers to develop and apply novel text-mining approaches that can help answer some of the many questions about coronavirus. Along these lines, NLM and leaders across the technology sector and academia joined OSTP on Monday, March 16, to announce the COVID-19 Open Research Dataset (CORD-19). Hosted by the Allen Institute for AI, CORD-19 is a free and growing resource that was launched with more than 29,000 scholarly articles about COVID-19 and the coronavirus family of viruses. CORD-19 represents the most extensive machine-readable coronavirus literature collection available for text mining to date. This dataset enables researchers to apply novel AI and machine learning strategies to identify new knowledge to help end the pandemic.

NLM’s other important resources in these efforts include:

  • NLM’s GenBank Sequence Database — NLM created the Severe acute respiratory syndrome coronavirus 2 data hub, where people can search for, retrieve, and analyze sequences of the virus that have been submitted to GenBank.
  • NLM’s Sequence Read Archive (SRA) — NLM’s SRA is the world’s largest publicly available repository of unprocessed sequence data which can be mined for previously unrecognized pathogen sequence. For example, a team from Stanford University recently reported that in a search of certain metagenomic datasets in the SRA, they identified a 2019-nCoV-like coronavirus in pangolins (a long-snouted mammal). This type of genetic sequence research can play an important role in understanding how the virus originated and is spreading.
  • NLM Intramural Research Contributions — NLM has a multidisciplinary group of researchers comprised of molecular biologists, biochemists, computer scientists, mathematicians and others working on a variety of problems, including some that relate to SARS-CoV-2/COVID-19. One such project is LitCovid, a resource that tracks COVID-19 specific literature published since the outbreak.
  • NLM is also providing targeted searches within several of its other information resources to help users find data and information relevant to COVID-19. These searches, available through the NLM home page, include information on clinical studies related to COVID-19 listed in, and articles related to the SARS-CoV-2/COVID-19 in PubMed, NLM’s database of citations and abstracts to more than 30 million journal articles and online books.
Categories: Data Science

Starting April 1st: Big Data in Healthcare

SEA Data Science - Tue, 2020-03-24 12:47

Did you miss the February instance of Big Data in Healthcare: Exploring Emerging Roles? Now is your chance to join!

We’re running an April/May session of this 6-week asynchronous course. For more information, or to register, visit

Questions? Contact Kiri Burcat.

Categories: Data Science

DataFlash: Pandemic Response Hackathon, March 27-29

PNR Data Science - Mon, 2020-03-23 19:08

COVID-19 Pandemic Response Virtual Hackathon, March 27-29

Event Description: The Pandemic Response Hackathon is a virtual hackathon aimed at better understanding and mitigating the spread of COVID-19 and future pandemics. The goal is to bring public health professionals alongside the technology community’s talent to contribute to the world’s response to the pandemic. Hackathon projects will be formulated and judged by an interdisciplinary panel of public health, health IT, and policy experts.

Datavant is hosting a Pandemic Response Virtual Hackathon from March 27-29 (Fri-Sun) to spur health innovation solutions in response to COVID-19 challenges facing frontline health workers, public health, and our communities. Clinicians, public heath, designers, software developers, health IT experts, patient advocacy groups, and community are welcome. Join us as we come together as a community to contribute to the world’s response to COVID19 and future pandemics.

Register for the Pandemic Response Hackathon.

Categories: Data Science

DataFlash: 2020 RDAP

PNR Data Science - Wed, 2020-03-18 18:21

This past March 2020 RDAP was hosted in Santa Fe, New Mexico.  Unfortunately, I was not able to attend in person, but I was able to catch the conference virtually.  The keynote speaker was incredibly informative and knowledgeable and of course, very articulate and engaging.  For me, the highlight of the Summit was the keynote speaker.  The conference keynote was Michele Suina, PhD (Cochiti Pueblo), Program Director, Albuquerque Area Southwest Tribal Epidemiology Center (AASTEC).

Michele talked about her work with the Global Indigenous Data Alliance (GIDA) which is a great organization that prides themselves on “promoting indigenous control of indigenous data” around the world.  Their data motto is “Be FAIR and CARE” which is a play on the popular data acronym FAIR (i.e. fair, accessible, interoperable, and reusable) and GIDA’s acronym for data CARE.

CARE is an acronym that reminds us that right because data is shared and open doesn’t necessarily mean that it’s tension-free for all people especially vulnerable populations like indigenous ones.  Let’s take a closer look at what CARE means.  The “C” in CARE stands for “Collective benefit” which means that data should be used in ways that empower Indigenous People so that they can derive maximum benefit from the data’s use.  The “A” in CARE stands for “Authority to control” which means we must recognize the rights and interests of Indigenous Peoples and their rights and interests over their data; in other words, we must respect their authority to control their own data. The “R” in CARE stands for “Responsibility” which means that we are responsible and accountable for how the data is being used to foster positive relationships with the Indigenous Peoples and that they derive the maximum benefit from their data. The “E” in CARE stand for “Ethics” which means that the wellbeing of the Indigenous Peoples should always be at the heart of the data life cycle and across data ecosystems.

As data enthusiasts we must remember that to be “FAIR”, we must also “CARE” especially with vulnerable populations like indigenous ones.



Categories: Data Science

DataFlash: Citizen Science Month (April 2020) and the PNR!

PNR Data Science - Thu, 2020-03-12 16:36

I have some very exciting news to share with you all.  The PNR Citizen Science Team has planned a great  PNR Rendezvous webinar entitled “What’s All this Talk About Citizen Science?” on Wednesday, April 29th, 2020 at 1 P.M. PT | 2 P.M. MT | 4 P.M. ET | 10 A.M. Hawaii | 12 P.M. Alaska with three AMAZING guest speakers.

The first guest speaker will be SciStarter founder Darlene Cavalier who will give us a basic introduction to the world of Citizen Science and suggest new ways that we can get involved with virtual Citizen Science projects especially in the time of COVID-19.  Joining Darlene are two high school science teachers, Cheryl Rice and Pete Recksiek, of Dalles, Oregon, who will share their classroom Citizen Science experience using two different projects they selected from the SciStarter/NLM microsite. Cheryl will discuss facilitating Debris Tracker and Pete will discuss facilitating Stallcatchers with their high school science students. Each will share why they choose their project, what went well, and what they’d do differently next time. They will also describe the benefits of Citizen Science project participation and offer advice for others, especially library staff, who want to offer citizen science opportunities through programs such as STEM, library nature groups, and science book clubs. The good news is: anyone can be a citizen scientist – all that’s needed is a bit of curiosity!

Please stay tuned to the Dragonfly Blog for more Citizen Science things during the month of April 2020!

Categories: Data Science

DataFlash: Telling the Real Coronavirus Story with Data

PNR Data Science - Tue, 2020-02-18 12:10

The new coronavirus (i.e. COVID-19) has some people in the United States worried.  As of February 18th, 2020, there are more than 70,000 confirmed cases in China right now.  The outbreak is serious, but if you’re living in the United States, the odds are that the regular flu is a much more serious risk to your health than the coronavirus.  The CDC reported that in the 2017-2018 year, that there were over 60,000 influenza/flu associated deaths in the United States alone.  On February 18th, 2020 coronavirus fatalities peaked at 1,875 in Asia alone with one death outside of Asia so far.

Again, according to the CDC, the risk of coronavirus infection to the general public of the United States is considered “low at this time” as the general American public is unlikely exposed to this virus.  This risk of infection changes of course if let’s say you are an American healthcare worker caring for patients with COVID-19 or if you have recently traveled to China.  According to Dr. Anthony Fauci, director of the National Institutes of Allergy and Infectious Diseases, care like washing your hands frequently as you can and staying away from crowded places where people are coughing and sneezing are more effective than wearing face masks.  According to Dr. Fauci the only people who need masks are those who are already infected to keep them from exposing others.

 A great data visualization/data dashboard on the coronavirus is one that was put out by Baltimore’s very own John Hopkins University.  Unlike some media outlets and social media, this data visualization tells a technically accurate data story of what’s going on with the coronavirus outbreak worldwide.  It takes a balanced and factual approach at looking at not only looking at the number of deaths (i.e. 1,875 as of 02/18/20), but the remarkable number of people who have recovered (i.e. 13,147 as of 02/18/20) from this viral infection.  The map on the dashboard accurately locates and quantifies the number of confirmed coronavirus cases with China having the most at 72,439 as of February 18th, 2020.  Finally, like all good data stories, the John Hopkins data visualization/story cites credible data sources like WHO, CDC, ECDC, NHC, and DXY in an attempt to be transparent and trustworthy.  All in all, this data visualization/story of the coronavirus makes a good attempt at a truthful depiction of the outbreak that is devoid of exaggeration and of most negative personal biases.

Data Citation:

John Hopkins University (2020). Coronavirus COVID-19 Global Cases by John Hopkins CSSE.

Categories: Data Science

DataFlash: A Fun but Authoritative Book on Citizen Science – A Book Review

PNR Data Science - Fri, 2020-02-14 16:40

“The Field Guide to Citizen Science: How You can Contribute to Scientific Research and Make a Difference”,  written by SciStarter experts Darlene Cavalier, Catherine Hoffman, and Caren Cooper is a fantastic read.  They do an excellent job of explaining that what defines citizen science, its history, and how you can easily become a citizen scientist with an array of citizen science projects that they highlight and recommend in their book.

I learned many interesting things about citizen science from this read.  For example, the term “citizen” at least in the United States, is associated with a contentious immigration debate about who is eligible to participate in civic life, including science and education.  As a result of this, other terms have been used to describe citizen science like community science, public participation in scientific research, participatory action research, and community based participatory research.  Despite its associated tensions with the term “citizen” in citizen science, none of the other terms is as complete or widely used as the term “citizen science”.

One thing that I was always skeptical about with citizen science was how scientists and researchers could trust citizen science data.  I learned though that with data collection and analysis from citizen scientists, that there exists a rigorous process for cleaning and collecting accurate data.  For example, generally if a data point stands out from the norm, it will undergo expert review. Also, to substantiate and validate data, citizen scientists as part of their data collection, submit photos of their specimen.  Among other things, extensive training and testing is done related to quality assurance and quality control for citizen science projects.  Lastly, I learned that almost one-quarter of citizen science projects compare data from many volunteers and validate data by independent consensus and sometimes projects request the same data in several different ways in order to double-check for errors.  It is these quality protocols that are ingrained into the citizen science project regiment that ensures citizen science data is trustworthy and valid.

For most of the book, the authors recommend various citizen science projects that are free or very affordable to do on your own or with your community.  Most of the citizen science projects can be found in SciStarter’s extensive database of citizen science projects.  As a result of Citizen Science month coming up this April 2020, the NNLM PNR group is planning a PNR-Rendezvous webinar on April 29th, 2020 at 1 PM PT, with guest speaker and SciStarter founder Darlene Cavalier.  Please stay tuned for more details!!!

Categories: Data Science

Funding Available in NY, NJ, PA and DE for Health Information Outreach

MAR Data Science - Fri, 2020-02-07 07:00

The National Network of Libraries of Medicine, Middle Atlantic Region (NNLM MAR) invites applications for health information outreach and programming projects.

The mission of the National Network of Libraries of Medicine is to advance the progress of medicine, improve public health by providing U.S. health professionals with equal access to biomedical information, and improve individuals’ access to information to enable them to make informed decisions about their health. Under a Cooperative Agreement with the National Library of Medicine (NLM), the University of Pittsburgh Health Sciences Library System serves as the Regional Medical Library for NNLM MAR.

  • Period of Performance: May 15, 2020 – April 30, 2021
  • Due Date: April 10, 2020 at noon ET
  • Notification of Awards: May 1, 2020

Eligibility: Network member organizations in the Middle Atlantic Region (Delaware, New Jersey, New York and Pennsylvania) are eligible to apply. Membership is free and open to libraries of all kinds, community-based organizations, clinics, public health departments and other organizations that provide or distribute health information. If your institution is not a NNLM Member, submit an application for Membership at least 3 weeks prior to the funding deadline. Membership is not automatic. A Member record is required to submit an application.

Available awards

Health Information Outreach Award: The purpose of the Health Information Outreach Award is to fund education and outreach projects that improve access to biomedical and health information and increase the ability of the public and health professionals to use these resources.

  • Amount: Up to $20,000
  • Awards Available: Minimum of 5 awards

Data Award: The purpose of the Data Award is to support data sharing and open science, and foster the development of information professionals in data science.

  • Amount: Up to $20,000
  • Awards Available: Minimum of 3 awards

Professional Development Award: This award enables individuals at NNLM MAR network member institutions to expand professional knowledge and experience in data science or health information access/delivery.

  • Amount: Up to $5,000
  • Awards Available: Minimum of 5 awards

New this year! Applications will only be accepted via the NNLM Online Applications System. Please allow extra time to familiarize yourself with the system requirements and watch a brief video tutorial about submitting an application.

Resources are available to help potential applicants, including:

Interested applicants are invited to attend these related webinars:

NNLM MAR staff are available for consultation and training on applicable National Library of Medicine resources and potential projects. Complete the NNLM MAR Award Interest Form, and someone will respond within three business days.

Categories: Data Science

DataFlash: Learn More About Citizen Science in 2020

PNR Data Science - Tue, 2020-01-28 13:49

Citizen science is happening all around you! Citizen science is an amazing way to participate in research efforts, and it can often be done from a mobile device, from one’s home, or from a library.

On February 24th, 2020, NNLM will be hosting a class called National Library of Medicine Resources for Citizen Scientists; participants will learn how to support citizen science in their communities and ways that libraries can participate. Participants will learn about citizen science library program models, free National Library of Medicine resources to incorporate into citizen science library programs, and sources of funding to explore for buying testing kits or supporting community research efforts. Citizen science library programs are perfect for all ages, and all types of libraries.

Class Information:

National Library of Medicine Resources for Citizen Scientists

February 24, 2020  @ 11PT/12MT/1CT/2ET

Instructor: Zoe Unno

1 MLA CE Credit

Categories: Data Science

What Will 2020 Bring for NLM?

PSR Data Science - Tue, 2020-01-14 19:53

In a recent blog post, NLM director Dr. Patti Brennan highlighted some of NLM’s accomplishments in 2019. So, what’s on tap for 2020? First, as NLM prepares for major renovations to its Building 38, most of the staff, including Dr. Brennan, will move to other office space on the NIH campus for about two years. That will be enough time to implement a major redesign of the first floor of the 60-year-old, architecturally dramatic but not really fit-for-purpose workspace to make more efficient use of the space, add modern office layouts and meeting spaces, and modernize the HVAC systems. Also, NLM will continue to grow its Intramural Research Program (IRP), which focuses on computational biomedical and health sciences. Two new tenure-track investigators were hired this past year and one or two more are expected to be added in 2020. The IRP brings together two NLM divisions, the National Center for Biotechnology Information, specifically the Computational Biology Branch, and the Lister Hill National Center for Biomedical Communications, which emphasize discovery based on molecular phenomena and clinical information. There will also be greater alignment of training efforts, including an expansion of the public-facing parts of training.

NLM will continue to make biomedical and health information literature available to the public, scientists, and clinicians, with a greater emphasis on public access and open science. The entire PubMed Central (PMC) repository of full-text literature is already freely available to the world, and with the increasing interest in open access to government-supported research findings, this repository is expected to grow. PMC will grow in new ways, too, such as enhancing the discoverability of data sets in support of published results made available with articles as supplementary material or in open repositories, and supporting greater transparency in scientific communication through the archiving of peer review documents. Many NLM resources will be moved to the cloud and continue to support efforts to make strides through the National Institutes of Health (NIH) Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative to accelerate discovery by harnessing the power of commercial cloud computing. This will not only offer some logistical savings, it will also increase the discoverability of NLM’s resources.

NLM will play a bigger and more vital role in big science as it unfolds at NIH. Intramural researchers are expanding the application of deep learning technologies to clinical, biological, and image data. In collaboration with the NIH Office of Data Science Strategy, NLM will build and release new tools to help researchers leverage the FHIR standard to make clinical data more accessible for research, and to improve phenotype characterization. These initiatives will accelerate data sharing by advancing standard approaches to research data representation. And finally, NLM will advance its impact on and outreach to professional and lay communities around the country. The National Network of Libraries of Medicine has exciting plans to expand its training in research data management and to provide local health information education and support to help health care providers working with American Indian and Alaska Native populations address challenges such as mental health and HPV-related cancer.

Categories: Data Science

Online Course – Big Data in Healthcare: Exploring Emerging Librarian Roles

MCR Data Science - Wed, 2020-01-08 15:49

Join us for the Big Data in Healthcare:  Exploring Emerging Roles course that will help health sciences librarians better understand the issues of big data in clinical outcomes and what roles health sciences librarians can take on in this service area. The 6-week course will be taught via Moodle and includes short readings, videos, and activities. We’ll start with an overview of data science and then learn about big data from a systems perspective, dig into how big data impacts patients and researchers, think about the role of librarians in supporting big data initiatives, and finish with an opportunity for you to develop an action plan based on course content. (6 MLA Continuing Education Credits)

Registration is limited to 50 participants. Register at:

Class Date: Feb 3, 2020 to Mar 17, 2020


Kirsten Burcat, MLIS, Data & Evaluation Coordinator
Derek Johnson, Health Professionals Outreach Specialist
Donna Harp Ziegenfuss, MS, EdD, Data Science Coordinator


Derek Johnson

Categories: Data Science

DataFlash: Library Carpentry Workshops

PNR Data Science - Tue, 2019-12-17 15:02

The NNLM Training Office (NTO) and Southeastern Atlantic Region (SEA) are pleased to host Library Carpentry workshops this spring and provide professional development funds to support travel to these exciting opportunities.

In this two-day interactive, hands-on workshop you will learn core software and data skills, with lessons including:

Participants may apply to attend the workshop series in either:

  • Baltimore, Maryland – March 19-20, 2020 or
  • Salt Lake City, Utah – March 26-27, 2020

To broaden access to this exciting training, we invite applications to cover the costs of travel and attendance, up to $1,500 for Baltimore, and $1,200 for Salt Lake City. Travel costs will be reimbursed after travel occurs.

For more information, please apply here.

Categories: Data Science