National Network of Libraries of Medicine
English Arabic Chinese (Simplified) French Hindi Japanese Korean Persian Portuguese Russian Spanish

Data Science

NLM Receives $10 Million Emergency CARES Act Funding Support

PSR Data Science - Tue, 2020-05-19 19:30

Dr. Patti Brennan has announced that NLM has received $10 million as part of the Coronavirus Aid, Relief, and Economic Security (CARES) Act, which provides emergency funding for federal agencies to combat the coronavirus outbreak. The funding is being used to support activities to improve the quality of clinical data for research and care; accelerate research including phenotyping, image analysis, and real-time surveillance; and to enhance access to COVID-19 literature and molecular data resources. The following activities highlight many of the investments that NLM is making with this emergency funding.

The novel coronavirus is driving a need for standardized COVID-19 terminology and data exchange that will allow clinicians and scientists to communicate more effectively and consistently. NLM will use the supplemental funds to support the addition of codes for COVID-19-related laboratory tests within LOINC (Logical Observation Identifiers Names and Codes) and to provide implementation guidelines and training in use of the standards. NLM is also enabling sharing of COVID-19 terminology updates through the Value Set Authority Center (VSAC), which makes available value sets and clinical terminologies. Value sets are codes from standard terminologies around specific concepts or conditions and are used as part of electronic clinical quality measures or to define patient cohorts, classes of interventions, or patient outcomes. This important work will facilitate the analysis of electronic health record data and support effective and interoperable health information exchange.

NLM is updating terminology for coronavirus-related drugs and chemicals through resources such as the Medical Subject Headings (MeSH) used for indexing and cataloging biomedical literature, and ChemIDplus, a dictionary of over 400,000 chemicals (names, synonyms, and structures). This work aligns terminology to facilitate the identification of chemicals and drugs used to treat, detect, and prevent COVID-19 and other coronavirus-related infections, including severe acute respiratory syndrome (SARS), and Middle East Respiratory Syndrome (MERS).

NLM’s intramural research program is using virus genomics, health data, and social media data to identify community spread of COVID-19. Researchers are applying machine learning and artificial intelligence techniques to chest X-rays to differentiate viral pneumonia from bacterial pneumonia – expanding knowledge of the process of the SARS-CoV-2 viral infection and assisting in the identification of best practices for diagnosis and care of COVID-19 patients. NLM research in natural language processing contributed to development of LitCovid, a curated literature hub for tracking scientific publications about the novel coronavirus. It provides centralized access to more than 13,500 relevant articles in PubMed, categorizes them by research topic and geographic location, and is updated daily.

NLM’s extramural research program is focusing on novel informatics and data science methods to rapidly improve the understanding of the infection of SARS-CoV-2 and of COVID-19. In April, NLM issued two Notices of Special Interest (NOT-LM-010 and NOT-LM-011) seeking applications (due in June) in these areas: the mining of clinical data for ‘deep phenotyping’ (gathering details about how a disease presents itself in an individual, fine-grained way) to identify or predict the presence of COVID-19; and public health surveillance methods that mine genomic, viromic, health data, environmental data or data from other pertinent sources such as social media, to identify spread and impact of SARS-Cov-2.

NLM is also improving access to published coronavirus literature via PubMed Central (PMC). In response to a call by science and technology advisors from a dozen countries to have publishers and scholarly societies make their COVID-19 and coronavirus-related publications immediately accessible in PMC, along with the available data supporting them, nearly 50 publishers have deposited more than 46,000 coronavirus-related articles in PMC with licenses that allow re-use and secondary analysis. Articles in the collection have been accessed more than 8 million times since March 18. NLM will use supplemental funds to improve the article-submission system to better accommodate publisher submissions and accelerate release of these critically important articles. On the PubMed side of literature offerings, NLM supplemental funds will support integrating LitCovid metadata. Novel sensors are being developed to leverage LitCovid metadata when directing users to curated COVID-19 content. The new infrastructure will permit PubMed to rapidly add additional disease-specific sensors in the future.

As of May 7, NLM’s GenBank resource has 3,893 SARS-CoV-2 sequences from 42 different countries that are publicly available. NLM created a special site, the “Severe acute respiratory syndrome coronavirus 2 data hub,” where people can search, retrieve, and analyze sequences of the virus that have been submitted to the GenBank database. In late March, NLM joined the CDC-led SPHERES consortium, a national genomics consortium which aims to coordinate U.S. SARS-CoV-2 sequencing efforts and make data publicly available in NLM’s GenBank and Sequence Read Archive (SRA), and other appropriate repositories. Supplemental funds will allow GenBank to further enhance the submission workflow, establish and promote use of metadata sample standards, and develop a fully automated SARS-CoV-2 submission workflow that incorporates quality checks, as well as ‘automated curation’, to provide standardized annotation of the SARS2 genomes submitted to GenBank.

SRA is positioned as a ready-made computational environment for public health surveillance pipelines and tool development. SRA metagenomic datasets from both environmental samples and patients diagnosed with COVID-19 can reveal patterns of co-occurring pathogens, newly emerging outbreaks, and viral evolution. NLM supplemental funds are being used to prototype SRA cloud-based analysis tools to search the entirety of the SRA database. These tools can provide efficient search for SARS-CoV-2, identify genetic patterns, and monitor newly submitted data for specific viral patterns.

NLM supplemental funding also supports the identification and selection of web and social media content documenting COVID-19 as part of NLM’s Global Health Events web archive collection. This content documents life in quarantine, prevention measures, the experiences of health care workers, patients, and more. NLM is also participating as an institutional contributor to a broader International Internet Preservation Consortium (IIPC) Novel Coronavirus outbreak web archive collection.

Categories: Data Science

Coming up soon! MCR monthly webinar on Data Literacy Education

MCR Data Science - Tue, 2020-05-05 18:10
What: Data Literacy: What It Is, Why It’s Important, and How to Bring It into The Classroom

Shannon Sheridan, Data Management Librarian at the University of Wyoming

May 20, 2020 at  1PM – 2PM (Pacific)| 2PM – 3 PM (Mountain) | 3PM – 4PM (Central) | 4PM – 5PM (Eastern)

Data touches all of our lives, yet it is a vital skills that is often left out of formal education. So how can we make sure students are prepared to deal with the data they inevitably will encounter in their work and lives? This presentation aims to answer that question. The presentation will provide an overview of what data literacy is and explain why it is an important component of students education. We also will discuss how instructors can bring data literacy, in both small and large ways, in to their classroom, and provide some concrete examples.

For more information and to register for this webinar go to the Webinar Session Webpage.

Contact for additional information.

Hope to see you online!

Categories: Data Science

Member Highlight: Stony Brook University, Stony Brook, NY

MAR Data Science - Thu, 2020-04-30 14:26
Jessica Koos

Jessica Koos, Health Sciences Librarian at Stony Brook University

The most recent Research Data Access & Preservation (RDAP) Summit was held from March 11-13 at the Santa Fe Convention Center in the beautiful city of Santa Fe, New Mexico.

Fortunately, the conference took place immediately before quarantining was implemented due to the recent COVID-19 outbreak. Several of the speakers and attendees cancelled their in-person attendance due to the outbreak, however the conference organizers diligently made the in-person presentations available to those attending remotely using Zoom. Several speakers also provided remote presentations.

The keynote speaker on the first day was Dr. Michele Suina, Program Director at the Albuquerque Area Southwest Tribal Epidemiology Center. Dr. Suina is a member of Cochiti Pueblo, and she related the cultural considerations of her people regarding data collection. She also spoke about Indigenous Data Sovereignty and its role in ensuring that indigenous populations have control over their own data.

The keynote speaker on the second day was Dr. Amber Budden, Director for Community Engagement and Outreach at DataONE, which is a large initiative made up of multiple collaborators designed to facilitate the sharing of environmental data. Dr. Budden’s presentation detailed the successes and future goals of DataONE.

The majority of the conference consisted of various panel presentations organized into themes, including Partnerships, Data Visualization, Consortia, and Data Privacy. Of particular interest was a panel presentation from Dr. David Fearon, Senior Data Management Consultant at Johns Hopkins University, entitled “Screening for Human Subject Disclosure Risk During Data Curation and RDM Service Connections.” The presentation described the efforts being undertaken by the university in providing various types of support to researchers in the health sciences in order to facilitate the sharing of health-related data.

Other conference activities included lightning rounds and the RDAP Business Meeting. Additionally, a poster session allowed for discussions among speakers and attendees. Opportunities for networking abounded over delicious conference-provided meals with a distinctly New Mexican flair.

Overall, this summit covered numerous aspects of data management as it relates to libraries and librarians. There was something for everyone, from the novice data management librarian to the data management specialist. I would encourage anyone with an interest in this topic to attend future events.

Jessica Koos

Jessica Koos at RDAP

If you’re interested in learning more about RDAP and the annual RDAP summit, please visit

Written by Jessica Koos, Health Sciences Librarian at Stony Brook University.

Categories: Data Science

Stephen Sherry, PhD, Named Acting Director of NLM’s National Center for Biotechnology Information

PSR Data Science - Wed, 2020-04-29 17:13
man in suit and tie smiling at the camera

Stephen Sherry, PhD

National Library of Medicine Director Patti Brennan, RN, PhD, has named Stephen Sherry, PhD, Acting Director of the National Center for Biotechnology Information (NCBI) at the National Library of Medicine effective March 31, 2020. As Acting Director of NCBI, Dr. Sherry oversees a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. He is also responsible for developing and operating all NCBI production services, with program areas spanning literature, sequences, chemistry, clinical research, and medical genetics.

Dr. Sherry also leads an NLM program to migrate NCBI’s largest resource, the Sequence Read Archive, into the cloud with the transfer and management of petabyte-scale sequence data on two commercial cloud platforms. He conducts research on the architecture of population genetic information to ensure human genetic information systems are both useful to researchers and respectful to the privacy of study participants.

Dr. Sherry earned his Ph.D. in Anthropology at the Pennsylvania State University in 1996, and post-doctorate at the Louisiana State University Medical Center prior to joining NLM in 1998.

Categories: Data Science

DataFlash: New and Free Virtual Science Conference in June 2020

PNR Data Science - Thu, 2020-04-23 10:13

We are pleased to announce that New England Science Bootcamp for Librarians will host a FREE virtual conference on June 11, 2020, from 6 A.M. – 1 P.M. PT | 9am – 4pm (US Eastern Time).

Topics will probably include, depending on speaker availability:

  • Vaccine research & manufacture
  • Virology
  • Making Health Devices in non-industrial settings
  • IRB and human subjects research  in the shifting landscape

Tentative Schedule

  • 9-10am                Session A
  • 10:15-10:45      Interstitial – Front Lines Stories
  • 11-12                    Session B
  • 12-1                      Lunch
  • 1-2                         Session C
  • 2:15-2:45            Interstitial – Front Lines Stories
  • 3-4                         Session D

The schedule of topics will be finalized and sent to all registrants soon.

This conference will run throughout the day, with access via a single link for the whole day that attendees will receive after registering. You may tune in as you have time or for the topics that interest you the most. More information and a detailed schedule will be available closer to the date of the event. Please register to get the most updated information sent to you.

Registration is available at

NER looks forward to seeing you online for a fun and informative conference! While NER had to cancel their in-person conference this year, they hope you will take advantage of this free professional development opportunity. While this conference is being organized in New England, they welcome attendance for anyone and anywhere.

Categories: Data Science

DataFlash: New Webinar on Sharing, Discovering, and Citing COVID-19 Data and Code in Generalist Repositories

PNR Data Science - Tue, 2020-04-21 12:47

The National Library of Medicine (NLM) at the National Institutes of Health is hosting a free webinar for researchers to learn how to share, discover, and cite COVID-19 data and code in generalist repositories on Friday, April 24 from 11:00 a.m. – 12:45 p.m. PT.

The biomedical research community’s understanding of the novel coronavirus and the associated coronavirus disease (COVID-19) is rapidly evolving. Open science and the timely sharing of research data have played a critical role in advancing our understanding of COVID-19 and accelerating the pace of discovery.

Researchers will have an opportunity to hear from multiple generalist repositories about the ways each repository is supporting discoverability and reusability of COVID-19 data and associated code. The NLM will also provide an overview of available COVID-19 literature.

The webinar will be available via NIH VideoCast.

Instructions on submitting questions will be made available closer to the webinar. Interested participants are encouraged to bookmark this page for the latest updates and follow #NIHdata on Twitter.  The webinar will be recorded and available a week after the live event.

See the agenda on the ODSS website.

Categories: Data Science

Upcoming Webinar on Sharing, Discovering, and Citing COVID-19 Data and Code in Generalist Repositories

SEA Data Science - Tue, 2020-04-21 12:11

The National Library of Medicine (NLM) at the National Institutes of Health is hosting a free webinar for researchers to learn how to share, discover, and cite COVID-19 data and code in generalist repositories on Friday, April 24 from 2:00-3:45 p.m. ET.

The biomedical research community’s understanding of the novel coronavirus and the associated coronavirus disease (COVID-19) is rapidly evolving. Open science and the timely sharing of research data have played a critical role in advancing our understanding of COVID-19 and accelerating the pace of discovery.

Researchers will have an opportunity to hear from multiple generalist repositories about the ways each repository is supporting discoverability and reusability of COVID-19 data and associated code. The NLM will also provide an overview of available COVID-19 literature.

The webinar will be available via NIH VideoCast.

Instructions on submitting questions will be made available closer to the webinar. Interested participants are encouraged to bookmark this page for the latest updates and follow #NIHdata on Twitter.  The webinar will be recorded and available a week after the live event.

See the agenda on the ODSS website.

To add the attached event to your calendar, double-click the attachment to open, next click the ‘Copy to My Calendar’ button.

Categories: Data Science

NIST and OSTP Launch Effort to Improve Search Engines for COVID-19 Research

PSR Data Science - Mon, 2020-04-20 15:56

The U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) and the White House Office of Science and Technology Policy (OSTP) have just launched a joint effort to support the development of search engines for research that will help in the fight against COVID-19. The project was developed in response to the March 16 White House Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset.

In this effort, NIST will work initially with the Allen Institute for Artificial Intelligence, the National Library of Medicine, Oregon Health & Science University (OHSU), and the University of Texas Health Science Center at Houston (UT Health). The team will apply the successful, long-running program of expert engagement and technology assessment called the Text Retrieval Conference, or TREC, to the COVID-19 Open Research Dataset (CORD-19), a resource of more than 44,000 research articles and related data about COVID-19 and the coronavirus family of viruses. The TREC-COVID program goals include creating datasets and using an independent assessment process that will help search engine developers to evaluate and optimize their systems in meeting the needs of the research and health-care communities.

The team will first release a series of sample queries for the biomedical research community, developed by team members at the National Library of Medicine, OHSU and UT Health. Registered participants in TREC-COVID will use their information retrieval and search systems to run the queries against the CORD-19 document set and return their results to NIST. Biomedical experts will then review test results, including document relevance rankings, to assess the overall performance of the retrieval systems.

Using proven TREC protocols, NIST will score the submissions and post the scores, the retrieval results themselves, and the lists of key reference documents to the TREC-COVID website. These “test collections” can then be used by information retrieval researchers to evaluate and enhance the performance of their own search engines. This effort is intended to help researchers understand how search systems could best support medical researchers when available information is developing quickly, as in the current pandemic.

The Allen Institute for Artificial Intelligence has been releasing an expanded CORD-19 document set each Friday to capture the most recent articles on COVID-19 and related coronaviruses. Later rounds of TREC-COVID will use the larger releases of CORD-19 and expanded query sets. Participants will have one week to submit their search results, and within about a week NIST will post results, with an expected spacing of about two weeks between each new dataset round being released. The team initially anticipates conducting five consecutive rounds of search system assessments. Interested organizations are invited to register to participate in the TREC-COVID program on the NIST website.

Categories: Data Science

Flu Near You: “Spread the word.  Not the flu”

PNR Data Science - Mon, 2020-04-13 11:03

The flu is a worrisome illness that kills thousands in the United States annually.  Have you ever wondered how bad an influenza-like sickness, like the flu, is spreading in your respective community?  Well, the citizen science project “Flu Near You” helps to answer that question while at the same time it is a preventative measure that could prepare you to help prevent the next pandemic.

The idea behind Flu Near You was conceived by epidemiologists from both Harvard University and Boston Children’s Hospital.  Flu Near You leverages the power of the crowd to provide real-time information about flu-like-illness in your respective living or working areas.  Basically, anybody can participate in tracking flu-like-illnesses in your area.

This is how Flu Near You works.  All you need is internet access and a smartphone or computer.  On their SciStarter project page or on their own project page, participants are asked to complete a quick, like 30 seconds maximum, weekly survey to share whether you feel healthy or sick.  Participation is voluntary and your individual report is treated confidentially.

At the end of your data collection, you are given a health dashboard of your community that shows how well your community and neighboring ones are doing flu-wise.  The following health dashboard shows the health of my community in Seattle as of Friday, April 10th, 2020, noon PT.

So, what happens with your data?  Scientists will use your data, along with thousands of other reports, to generate local and national maps of influenza-like illness. This important crowd-sourced information provides public health officials and researchers with real-time, anonymous information that could help prevent the next pandemic.

Participating in a Flu Near You can help keep our communities healthy and strong!

Categories: Data Science

New On-Demand Course Available – Cool Creative Communications: Dazzling Data Visualization

SEA Data Science - Wed, 2020-04-01 14:27

Earn MLA CEs while you practice your data visualization skills!

Data Visualization enables us to quickly glean insights and patterns from data and communicate its key aspects intuitively, persuasively, and memorably. In this session, participants will discuss the fundamental principles of effective visual data communication as they critique and evaluate existing visualizations. They will also locate sources for downloadable data, and develop simple interactive visualizations using Tableau Public, a free and popular data visualization tool.

This class is intended as a quick-start guide to creating effective data visualizations and is geared toward a general audience with no prior experience creating visualizations.

This is an online, asynchronous class, offered via Moodle, the NNLM’s learning management system.

To learn more or to register:

Questions? Contact Kiri Burcat

Categories: Data Science

NLM Responds to the Coronavirus Pandemic

PSR Data Science - Tue, 2020-03-24 19:13

The National Library of Medicine is working on multiple fronts to improve researchers’ understanding of SARS-CoV-2 (the virus that causes the novel coronavirus) and aid in the response to COVID-19 (the disease caused by the novel coronavirus). By enhancing access to relevant data and information, NLM is demonstrating how libraries can contribute in real time to research and response efforts during this crisis.

NLM is using PubMed Central®, its digital archive of peer-reviewed biomedical and life sciences journal literature, to expand access to full-text articles related to coronavirus. These activities build on recent requests from the White House Office of Science and Technology Policy (OSTP) and science policy leaders of other nations calling on the global publishing community to make all COVID-19-related research publications and data immediately available to the public in forms that support automated text-mining.

NLM has stepped up its collaboration with publishers and scholarly societies to increase the number of coronavirus-related journal articles in PMC, along with the available data supporting them. NLM is adapting its standard procedures for depositing articles into PMC to make it easier and faster to submit articles in machine-readable formats. NLM is also engaging with journals and publishers that do not participate in PMC but whose publications are within the scope of the Library’s collection. A growing number of publishers and societies are taking advantage of these flexibilities. Submitted publications are being made available as quickly as possible after publication for discovery in PMC and through the PMC Text Mining Collections for machine analysis, secondary analysis, and other types of reuse.

This enhanced collection of text-minable content enables AI and machine-learning researchers to develop and apply novel text-mining approaches that can help answer some of the many questions about coronavirus. Along these lines, NLM and leaders across the technology sector and academia joined OSTP on Monday, March 16, to announce the COVID-19 Open Research Dataset (CORD-19). Hosted by the Allen Institute for AI, CORD-19 is a free and growing resource that was launched with more than 29,000 scholarly articles about COVID-19 and the coronavirus family of viruses. CORD-19 represents the most extensive machine-readable coronavirus literature collection available for text mining to date. This dataset enables researchers to apply novel AI and machine learning strategies to identify new knowledge to help end the pandemic.

NLM’s other important resources in these efforts include:

  • NLM’s GenBank Sequence Database — NLM created the Severe acute respiratory syndrome coronavirus 2 data hub, where people can search for, retrieve, and analyze sequences of the virus that have been submitted to GenBank.
  • NLM’s Sequence Read Archive (SRA) — NLM’s SRA is the world’s largest publicly available repository of unprocessed sequence data which can be mined for previously unrecognized pathogen sequence. For example, a team from Stanford University recently reported that in a search of certain metagenomic datasets in the SRA, they identified a 2019-nCoV-like coronavirus in pangolins (a long-snouted mammal). This type of genetic sequence research can play an important role in understanding how the virus originated and is spreading.
  • NLM Intramural Research Contributions — NLM has a multidisciplinary group of researchers comprised of molecular biologists, biochemists, computer scientists, mathematicians and others working on a variety of problems, including some that relate to SARS-CoV-2/COVID-19. One such project is LitCovid, a resource that tracks COVID-19 specific literature published since the outbreak.
  • NLM is also providing targeted searches within several of its other information resources to help users find data and information relevant to COVID-19. These searches, available through the NLM home page, include information on clinical studies related to COVID-19 listed in, and articles related to the SARS-CoV-2/COVID-19 in PubMed, NLM’s database of citations and abstracts to more than 30 million journal articles and online books.
Categories: Data Science

Starting April 1st: Big Data in Healthcare

SEA Data Science - Tue, 2020-03-24 12:47

Did you miss the February instance of Big Data in Healthcare: Exploring Emerging Roles? Now is your chance to join!

We’re running an April/May session of this 6-week asynchronous course. For more information, or to register, visit

Questions? Contact Kiri Burcat.

Categories: Data Science

DataFlash: Pandemic Response Hackathon, March 27-29

PNR Data Science - Mon, 2020-03-23 19:08

COVID-19 Pandemic Response Virtual Hackathon, March 27-29

Event Description: The Pandemic Response Hackathon is a virtual hackathon aimed at better understanding and mitigating the spread of COVID-19 and future pandemics. The goal is to bring public health professionals alongside the technology community’s talent to contribute to the world’s response to the pandemic. Hackathon projects will be formulated and judged by an interdisciplinary panel of public health, health IT, and policy experts.

Datavant is hosting a Pandemic Response Virtual Hackathon from March 27-29 (Fri-Sun) to spur health innovation solutions in response to COVID-19 challenges facing frontline health workers, public health, and our communities. Clinicians, public heath, designers, software developers, health IT experts, patient advocacy groups, and community are welcome. Join us as we come together as a community to contribute to the world’s response to COVID19 and future pandemics.

Register for the Pandemic Response Hackathon.

Categories: Data Science

DataFlash: 2020 RDAP

PNR Data Science - Wed, 2020-03-18 18:21

This past March 2020 RDAP was hosted in Santa Fe, New Mexico.  Unfortunately, I was not able to attend in person, but I was able to catch the conference virtually.  The keynote speaker was incredibly informative and knowledgeable and of course, very articulate and engaging.  For me, the highlight of the Summit was the keynote speaker.  The conference keynote was Michele Suina, PhD (Cochiti Pueblo), Program Director, Albuquerque Area Southwest Tribal Epidemiology Center (AASTEC).

Michele talked about her work with the Global Indigenous Data Alliance (GIDA) which is a great organization that prides themselves on “promoting indigenous control of indigenous data” around the world.  Their data motto is “Be FAIR and CARE” which is a play on the popular data acronym FAIR (i.e. fair, accessible, interoperable, and reusable) and GIDA’s acronym for data CARE.

CARE is an acronym that reminds us that right because data is shared and open doesn’t necessarily mean that it’s tension-free for all people especially vulnerable populations like indigenous ones.  Let’s take a closer look at what CARE means.  The “C” in CARE stands for “Collective benefit” which means that data should be used in ways that empower Indigenous People so that they can derive maximum benefit from the data’s use.  The “A” in CARE stands for “Authority to control” which means we must recognize the rights and interests of Indigenous Peoples and their rights and interests over their data; in other words, we must respect their authority to control their own data. The “R” in CARE stands for “Responsibility” which means that we are responsible and accountable for how the data is being used to foster positive relationships with the Indigenous Peoples and that they derive the maximum benefit from their data. The “E” in CARE stand for “Ethics” which means that the wellbeing of the Indigenous Peoples should always be at the heart of the data life cycle and across data ecosystems.

As data enthusiasts we must remember that to be “FAIR”, we must also “CARE” especially with vulnerable populations like indigenous ones.



Categories: Data Science

DataFlash: Citizen Science Month (April 2020) and the PNR!

PNR Data Science - Thu, 2020-03-12 16:36

I have some very exciting news to share with you all.  The PNR Citizen Science Team has planned a great  PNR Rendezvous webinar entitled “What’s All this Talk About Citizen Science?” on Wednesday, April 29th, 2020 at 1 P.M. PT | 2 P.M. MT | 4 P.M. ET | 10 A.M. Hawaii | 12 P.M. Alaska with three AMAZING guest speakers.

The first guest speaker will be SciStarter founder Darlene Cavalier who will give us a basic introduction to the world of Citizen Science and suggest new ways that we can get involved with virtual Citizen Science projects especially in the time of COVID-19.  Joining Darlene are two high school science teachers, Cheryl Rice and Pete Recksiek, of Dalles, Oregon, who will share their classroom Citizen Science experience using two different projects they selected from the SciStarter/NLM microsite. Cheryl will discuss facilitating Debris Tracker and Pete will discuss facilitating Stallcatchers with their high school science students. Each will share why they choose their project, what went well, and what they’d do differently next time. They will also describe the benefits of Citizen Science project participation and offer advice for others, especially library staff, who want to offer citizen science opportunities through programs such as STEM, library nature groups, and science book clubs. The good news is: anyone can be a citizen scientist – all that’s needed is a bit of curiosity!

Please stay tuned to the Dragonfly Blog for more Citizen Science things during the month of April 2020!

Categories: Data Science

DataFlash: Telling the Real Coronavirus Story with Data

PNR Data Science - Tue, 2020-02-18 12:10

The new coronavirus (i.e. COVID-19) has some people in the United States worried.  As of February 18th, 2020, there are more than 70,000 confirmed cases in China right now.  The outbreak is serious, but if you’re living in the United States, the odds are that the regular flu is a much more serious risk to your health than the coronavirus.  The CDC reported that in the 2017-2018 year, that there were over 60,000 influenza/flu associated deaths in the United States alone.  On February 18th, 2020 coronavirus fatalities peaked at 1,875 in Asia alone with one death outside of Asia so far.

Again, according to the CDC, the risk of coronavirus infection to the general public of the United States is considered “low at this time” as the general American public is unlikely exposed to this virus.  This risk of infection changes of course if let’s say you are an American healthcare worker caring for patients with COVID-19 or if you have recently traveled to China.  According to Dr. Anthony Fauci, director of the National Institutes of Allergy and Infectious Diseases, care like washing your hands frequently as you can and staying away from crowded places where people are coughing and sneezing are more effective than wearing face masks.  According to Dr. Fauci the only people who need masks are those who are already infected to keep them from exposing others.

 A great data visualization/data dashboard on the coronavirus is one that was put out by Baltimore’s very own John Hopkins University.  Unlike some media outlets and social media, this data visualization tells a technically accurate data story of what’s going on with the coronavirus outbreak worldwide.  It takes a balanced and factual approach at looking at not only looking at the number of deaths (i.e. 1,875 as of 02/18/20), but the remarkable number of people who have recovered (i.e. 13,147 as of 02/18/20) from this viral infection.  The map on the dashboard accurately locates and quantifies the number of confirmed coronavirus cases with China having the most at 72,439 as of February 18th, 2020.  Finally, like all good data stories, the John Hopkins data visualization/story cites credible data sources like WHO, CDC, ECDC, NHC, and DXY in an attempt to be transparent and trustworthy.  All in all, this data visualization/story of the coronavirus makes a good attempt at a truthful depiction of the outbreak that is devoid of exaggeration and of most negative personal biases.

Data Citation:

John Hopkins University (2020). Coronavirus COVID-19 Global Cases by John Hopkins CSSE.

Categories: Data Science

DataFlash: A Fun but Authoritative Book on Citizen Science – A Book Review

PNR Data Science - Fri, 2020-02-14 16:40

“The Field Guide to Citizen Science: How You can Contribute to Scientific Research and Make a Difference”,  written by SciStarter experts Darlene Cavalier, Catherine Hoffman, and Caren Cooper is a fantastic read.  They do an excellent job of explaining that what defines citizen science, its history, and how you can easily become a citizen scientist with an array of citizen science projects that they highlight and recommend in their book.

I learned many interesting things about citizen science from this read.  For example, the term “citizen” at least in the United States, is associated with a contentious immigration debate about who is eligible to participate in civic life, including science and education.  As a result of this, other terms have been used to describe citizen science like community science, public participation in scientific research, participatory action research, and community based participatory research.  Despite its associated tensions with the term “citizen” in citizen science, none of the other terms is as complete or widely used as the term “citizen science”.

One thing that I was always skeptical about with citizen science was how scientists and researchers could trust citizen science data.  I learned though that with data collection and analysis from citizen scientists, that there exists a rigorous process for cleaning and collecting accurate data.  For example, generally if a data point stands out from the norm, it will undergo expert review. Also, to substantiate and validate data, citizen scientists as part of their data collection, submit photos of their specimen.  Among other things, extensive training and testing is done related to quality assurance and quality control for citizen science projects.  Lastly, I learned that almost one-quarter of citizen science projects compare data from many volunteers and validate data by independent consensus and sometimes projects request the same data in several different ways in order to double-check for errors.  It is these quality protocols that are ingrained into the citizen science project regiment that ensures citizen science data is trustworthy and valid.

For most of the book, the authors recommend various citizen science projects that are free or very affordable to do on your own or with your community.  Most of the citizen science projects can be found in SciStarter’s extensive database of citizen science projects.  As a result of Citizen Science month coming up this April 2020, the NNLM PNR group is planning a PNR-Rendezvous webinar on April 29th, 2020 at 1 PM PT, with guest speaker and SciStarter founder Darlene Cavalier.  Please stay tuned for more details!!!

Categories: Data Science

Funding Available in NY, NJ, PA and DE for Health Information Outreach

MAR Data Science - Fri, 2020-02-07 07:00

The National Network of Libraries of Medicine, Middle Atlantic Region (NNLM MAR) invites applications for health information outreach and programming projects.

The mission of the National Network of Libraries of Medicine is to advance the progress of medicine, improve public health by providing U.S. health professionals with equal access to biomedical information, and improve individuals’ access to information to enable them to make informed decisions about their health. Under a Cooperative Agreement with the National Library of Medicine (NLM), the University of Pittsburgh Health Sciences Library System serves as the Regional Medical Library for NNLM MAR.

  • Period of Performance: May 15, 2020 – April 30, 2021
  • Due Date: April 10, 2020 at noon ET
  • Notification of Awards: May 1, 2020

Eligibility: Network member organizations in the Middle Atlantic Region (Delaware, New Jersey, New York and Pennsylvania) are eligible to apply. Membership is free and open to libraries of all kinds, community-based organizations, clinics, public health departments and other organizations that provide or distribute health information. If your institution is not a NNLM Member, submit an application for Membership at least 3 weeks prior to the funding deadline. Membership is not automatic. A Member record is required to submit an application.

Available awards

Health Information Outreach Award: The purpose of the Health Information Outreach Award is to fund education and outreach projects that improve access to biomedical and health information and increase the ability of the public and health professionals to use these resources.

  • Amount: Up to $20,000
  • Awards Available: Minimum of 5 awards

Data Award: The purpose of the Data Award is to support data sharing and open science, and foster the development of information professionals in data science.

  • Amount: Up to $20,000
  • Awards Available: Minimum of 3 awards

Professional Development Award: This award enables individuals at NNLM MAR network member institutions to expand professional knowledge and experience in data science or health information access/delivery.

  • Amount: Up to $5,000
  • Awards Available: Minimum of 5 awards

New this year! Applications will only be accepted via the NNLM Online Applications System. Please allow extra time to familiarize yourself with the system requirements and watch a brief video tutorial about submitting an application.

Resources are available to help potential applicants, including:

Interested applicants are invited to attend these related webinars:

NNLM MAR staff are available for consultation and training on applicable National Library of Medicine resources and potential projects. Complete the NNLM MAR Award Interest Form, and someone will respond within three business days.

Categories: Data Science