Network of the National Library of Medicine
English Arabic Chinese (Simplified) French Hindi Japanese Korean Persian Portuguese Russian Spanish

Data Science

Midday at the Oasis Webinar Change

PSR Data Science - Wed, 2020-07-08 13:21

As you may have noted, we have not hosted a Midday at the Oasis webinar in recent months. We’ve been participating in planning coordinated webinar series with our colleagues across the network. The coordinated webinar series will include information that is relevant to the broad community of librarians in the network membership.

We will still host Midday at the Oasis, on an as-needed basis when we identify great content to share that may be specific to our region or not fit into the series created with our colleagues. We welcome suggestions for content that you would like to see or present.

We invite you to plan to attend the webinars in the series listed below:

More information will follow.  We will continue to send announcements via PSR-News, when webinars are scheduled, so that you can attend and earn MLA CE credits, as available.

Categories: Data Science

Apply to Host a Library Carpentry Workshop for Your Organization!

SEA Data Science - Mon, 2020-06-29 10:15

The National Network of Libraries of Medicine (NNLM), Southeastern Atlantic region (SEA) is pleased to offer Library Carpentry workshops for up to ten SEA member institutions to support the development of data science and computational skills.

Library Carpentry focuses on building software and data skills within library and information-related communities. Their goal is to empower people in these roles to use software and data in their own work and to become advocates for and train others in efficient, effective and reproducible data and software practices.

Note: Library Carpentry workshops are traditionally offered face-to-face, but they’ve been adapted to an online format. Due to COVID-19, the NNLM SEA strongly recommends organizations host remote sessions.

Logistics

Workshops are approximately 16 hours long. For remote workshops, the Carpentries organization recommends four 4-hour sessions. Workshops can accommodate up to 20 learners. We encourage workshop hosts to invite information professionals from neighboring institutions to fill the 20 spots if your organization is unable to fill all spots. The Carpentries organization requests two months of planning time for each workshop.

If you are selected, the Carpentries organization will provide for remote workshops:

  • Four instructors to lead lessons
  • Planning, scheduling, and registration support
  • An informational webpage for your workshop participants
  • Pre and post workshop evaluation

You will be responsible for:

  • Providing your own video conferencing platform (Zoom, WebEx, etc.) if possible (accommodations can be made if you do not have access to a video conferencing platform through your organization)
  • Finding two volunteers who are familiar with the subject matter in the lesson plans, to attend the workshop as helpers
  • Advertising your workshop to potential participants
  • Completing an Activity Report for NNLM SEA after the event

If you are interested in hosting an in-person workshop before April 30, 2021, please discuss additional requirements and considerations with the Carpentries organization if awarded.

More Information

The target audience is learners who have little to no prior computational experience. The instructors put a priority on creating a friendly environment to empower researchers and enable data-driven discovery. Even those with some experience will benefit, as the goal is to teach not only how to do analyses, but how to manage the process to make it as automated and reproducible as possible. Biomedical and health sciences librarians and LIS students are encouraged to participate.

In this interactive, hands-on workshop you will learn core software and data skills, with lessons including:

Eligibility

Your organization must be a NNLM Network Member. If your organization is not a Network Member, they can join for free!

All participants must be prepared to observe The Carpentries Code of Conduct in workshops.

Apply Now

Applications are open now! The deadline to apply is Friday, July 3, 2020.

For questions, please contact Kiri Burcat and Tony Nguyen.

Categories: Data Science

Join the RD3 Content Advisory Board

SEA Data Science - Tue, 2020-06-23 10:31
The Network of the National Library of Medicine (NNLM) Resources for Data Driven Discovery (RD3) web portal fosters learning and collaboration in data science and data management.

The Research Data Management Workgroup of the NNLM is recruiting Advisory Board members to be part of a committee that reviews and suggests resources for the RD3 web portal. If you are interested in being part of the RD3 Content Advisory Board send your name to Mary Piorun at mary.piorun@umassmed.edu by July 1st with a brief narrative (less than 300 words) explaining your interest.

Meetings will be monthly until all current resources have been reviewed, and quarterly thereafter.

Categories: Data Science

DataFlash: LitCovid vs. COVID-19 Portfolio Tool

PNR Data Science - Mon, 2020-06-08 14:37

For those of you who are curious about the differences between two new COVID-19 tools, LitCovid and COVID-19 Portfolio Tool, here are some answers!

As you already know, LitCovid is a curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus.  Though, LitCovid is limited to articles in PubMed, includes research on other coronaviruses such as MERS, divides the articles into different categories (e.g. Mechanism, Transmission, Diagnosis, and Treatment), and shows the countries of origin on a world map.

This contrasts with the “iSearch” COVID-19 Portfolio Tool, which like LitCovid is a comprehensive, expert-curated source for publications related to COVID-19.  Though, COVID-19 Portfolio Tool has the following features that distinguish it from LitCovid:

  1. includes both publications and preprints (the medRxiv, SSRN, arXiv, bioRxiv, Research Square and ChemRxiv);
  2. is curated by subject matter experts to focus coverage on SARS-CoV-2/COVID-19;
  3. allows searching of full text and/or supplemental data in addition to titles and abstracts;
  4. leverages the cutting-edge analytics available in our iSearch tool, including powerful search functionality and faceting;
  5. includes interactive visualizations that allows users to select topics within their search results for download or further queries; and
  6. makes it easy to download results at any point as a CSV or Excel file.
Categories: Data Science

Apply to Host a Library Carpentry Workshop for Your Organization!

SEA Data Science - Wed, 2020-06-03 16:09

The National Network of Libraries of Medicine (NNLM), Southeastern Atlantic region (SEA) is pleased to offer Library Carpentry workshops for up to ten SEA member institutions to support the development of data science and computational skills.

Library Carpentry focuses on building software and data skills within library and information-related communities. Their goal is to empower people in these roles to use software and data in their own work and to become advocates for and train others in efficient, effective and reproducible data and software practices.

Note: Library Carpentry workshops are traditionally offered face-to-face, but they’ve been adapted to an online format. Due to COVID-19, the NNLM SEA strongly recommends organizations host remote sessions.

Logistics

Workshops are approximately 16 hours long. For remote workshops, the Carpentries organization recommends four 4-hour sessions. Workshops can accommodate up to 20 learners. We encourage workshop hosts to invite information professionals from neighboring institutions to fill the 20 spots if your organization is unable to fill all spots. The Carpentries organization requests two months of planning time for each workshop.

If you are selected, the Carpentries organization will provide for remote workshops:

  • Four instructors to lead lessons
  • Planning, scheduling, and registration support
  • An informational webpage for your workshop participants
  • Pre and post workshop evaluation

You will be responsible for:

  • Providing your own video conferencing platform (Zoom, WebEx, etc.) if possible (accommodations can be made if you do not have access to a video conferencing platform through your organization)
  • Finding two volunteers who are familiar with the subject matter in the lesson plans, to attend the workshop as helpers
  • Advertising your workshop to potential participants
  • Completing an Activity Report for NNLM SEA after the event

If you are interested in hosting an in-person workshop before April 30, 2021, please discuss additional requirements and considerations with the Carpentries organization if awarded.

More Information

The target audience is learners who have little to no prior computational experience. The instructors put a priority on creating a friendly environment to empower researchers and enable data-driven discovery. Even those with some experience will benefit, as the goal is to teach not only how to do analyses, but how to manage the process to make it as automated and reproducible as possible. Biomedical and health sciences librarians and LIS students are encouraged to participate.

In this interactive, hands-on workshop you will learn core software and data skills, with lessons including:

Eligibility

Your organization must be a NNLM Network Member. If your organization is not a Network Member, they can join for free!

All participants must be prepared to observe The Carpentries Code of Conduct in workshops.

Apply Now

Applications are open now! The deadline to apply is Friday, July 3, 2020.

For questions, please contact Kiri Burcat and Tony Nguyen.

Categories: Data Science

DataFlash: LitCovid

PNR Data Science - Thu, 2020-05-28 10:44

There is a new literature search hub that is available for the 2019 novel Coroavirus that was developed as a collaboration between the journal Nature and the US National Institutes of Health’s intramural research programme.

LitCovid is a curated literature hub for tracking up to date scientific information about COVID-19.  Right now, it is the most thorough resource on the subject through providing access to a growing number of relevant articles in PubMed – i.e. about 17,000. Unlike doing the conventional keyword searches for “COVID-19” or “nCOV”, LitCovid has a sophisticated search function that identifies 35% more pertinent articles. As well, the articles are organized by topic as well as by geographic location.

 

Categories: Data Science

NLM Receives $10 Million Emergency CARES Act Funding Support

PSR Data Science - Tue, 2020-05-19 19:30

Dr. Patti Brennan has announced that NLM has received $10 million as part of the Coronavirus Aid, Relief, and Economic Security (CARES) Act, which provides emergency funding for federal agencies to combat the coronavirus outbreak. The funding is being used to support activities to improve the quality of clinical data for research and care; accelerate research including phenotyping, image analysis, and real-time surveillance; and to enhance access to COVID-19 literature and molecular data resources. The following activities highlight many of the investments that NLM is making with this emergency funding.

The novel coronavirus is driving a need for standardized COVID-19 terminology and data exchange that will allow clinicians and scientists to communicate more effectively and consistently. NLM will use the supplemental funds to support the addition of codes for COVID-19-related laboratory tests within LOINC (Logical Observation Identifiers Names and Codes) and to provide implementation guidelines and training in use of the standards. NLM is also enabling sharing of COVID-19 terminology updates through the Value Set Authority Center (VSAC), which makes available value sets and clinical terminologies. Value sets are codes from standard terminologies around specific concepts or conditions and are used as part of electronic clinical quality measures or to define patient cohorts, classes of interventions, or patient outcomes. This important work will facilitate the analysis of electronic health record data and support effective and interoperable health information exchange.

NLM is updating terminology for coronavirus-related drugs and chemicals through resources such as the Medical Subject Headings (MeSH) used for indexing and cataloging biomedical literature, and ChemIDplus, a dictionary of over 400,000 chemicals (names, synonyms, and structures). This work aligns terminology to facilitate the identification of chemicals and drugs used to treat, detect, and prevent COVID-19 and other coronavirus-related infections, including severe acute respiratory syndrome (SARS), and Middle East Respiratory Syndrome (MERS).

NLM’s intramural research program is using virus genomics, health data, and social media data to identify community spread of COVID-19. Researchers are applying machine learning and artificial intelligence techniques to chest X-rays to differentiate viral pneumonia from bacterial pneumonia – expanding knowledge of the process of the SARS-CoV-2 viral infection and assisting in the identification of best practices for diagnosis and care of COVID-19 patients. NLM research in natural language processing contributed to development of LitCovid, a curated literature hub for tracking scientific publications about the novel coronavirus. It provides centralized access to more than 13,500 relevant articles in PubMed, categorizes them by research topic and geographic location, and is updated daily.

NLM’s extramural research program is focusing on novel informatics and data science methods to rapidly improve the understanding of the infection of SARS-CoV-2 and of COVID-19. In April, NLM issued two Notices of Special Interest (NOT-LM-010 and NOT-LM-011) seeking applications (due in June) in these areas: the mining of clinical data for ‘deep phenotyping’ (gathering details about how a disease presents itself in an individual, fine-grained way) to identify or predict the presence of COVID-19; and public health surveillance methods that mine genomic, viromic, health data, environmental data or data from other pertinent sources such as social media, to identify spread and impact of SARS-Cov-2.

NLM is also improving access to published coronavirus literature via PubMed Central (PMC). In response to a call by science and technology advisors from a dozen countries to have publishers and scholarly societies make their COVID-19 and coronavirus-related publications immediately accessible in PMC, along with the available data supporting them, nearly 50 publishers have deposited more than 46,000 coronavirus-related articles in PMC with licenses that allow re-use and secondary analysis. Articles in the collection have been accessed more than 8 million times since March 18. NLM will use supplemental funds to improve the article-submission system to better accommodate publisher submissions and accelerate release of these critically important articles. On the PubMed side of literature offerings, NLM supplemental funds will support integrating LitCovid metadata. Novel sensors are being developed to leverage LitCovid metadata when directing users to curated COVID-19 content. The new infrastructure will permit PubMed to rapidly add additional disease-specific sensors in the future.

As of May 7, NLM’s GenBank resource has 3,893 SARS-CoV-2 sequences from 42 different countries that are publicly available. NLM created a special site, the “Severe acute respiratory syndrome coronavirus 2 data hub,” where people can search, retrieve, and analyze sequences of the virus that have been submitted to the GenBank database. In late March, NLM joined the CDC-led SPHERES consortium, a national genomics consortium which aims to coordinate U.S. SARS-CoV-2 sequencing efforts and make data publicly available in NLM’s GenBank and Sequence Read Archive (SRA), and other appropriate repositories. Supplemental funds will allow GenBank to further enhance the submission workflow, establish and promote use of metadata sample standards, and develop a fully automated SARS-CoV-2 submission workflow that incorporates quality checks, as well as ‘automated curation’, to provide standardized annotation of the SARS2 genomes submitted to GenBank.

SRA is positioned as a ready-made computational environment for public health surveillance pipelines and tool development. SRA metagenomic datasets from both environmental samples and patients diagnosed with COVID-19 can reveal patterns of co-occurring pathogens, newly emerging outbreaks, and viral evolution. NLM supplemental funds are being used to prototype SRA cloud-based analysis tools to search the entirety of the SRA database. These tools can provide efficient search for SARS-CoV-2, identify genetic patterns, and monitor newly submitted data for specific viral patterns.

NLM supplemental funding also supports the identification and selection of web and social media content documenting COVID-19 as part of NLM’s Global Health Events web archive collection. This content documents life in quarantine, prevention measures, the experiences of health care workers, patients, and more. NLM is also participating as an institutional contributor to a broader International Internet Preservation Consortium (IIPC) Novel Coronavirus outbreak web archive collection.

Categories: Data Science

Coming up soon! MCR monthly webinar on Data Literacy Education

MCR Data Science - Tue, 2020-05-05 18:10
What: Data Literacy: What It Is, Why It’s Important, and How to Bring It into The Classroom

Presenter:
Shannon Sheridan, Data Management Librarian at the University of Wyoming

When: 
May 20, 2020 at  1PM – 2PM (Pacific)| 2PM – 3 PM (Mountain) | 3PM – 4PM (Central) | 4PM – 5PM (Eastern)

Data touches all of our lives, yet it is a vital skills that is often left out of formal education. So how can we make sure students are prepared to deal with the data they inevitably will encounter in their work and lives? This presentation aims to answer that question. The presentation will provide an overview of what data literacy is and explain why it is an important component of students education. We also will discuss how instructors can bring data literacy, in both small and large ways, in to their classroom, and provide some concrete examples.

For more information and to register for this webinar go to the Webinar Session Webpage.

Contact donna.ziegenfuss@utah.edu for additional information.

Hope to see you online!

Categories: Data Science

Member Highlight: Stony Brook University, Stony Brook, NY

MAR Data Science - Thu, 2020-04-30 14:26
Jessica Koos

Jessica Koos, Health Sciences Librarian at Stony Brook University

The most recent Research Data Access & Preservation (RDAP) Summit was held from March 11-13 at the Santa Fe Convention Center in the beautiful city of Santa Fe, New Mexico.

Fortunately, the conference took place immediately before quarantining was implemented due to the recent COVID-19 outbreak. Several of the speakers and attendees cancelled their in-person attendance due to the outbreak, however the conference organizers diligently made the in-person presentations available to those attending remotely using Zoom. Several speakers also provided remote presentations.

The keynote speaker on the first day was Dr. Michele Suina, Program Director at the Albuquerque Area Southwest Tribal Epidemiology Center. Dr. Suina is a member of Cochiti Pueblo, and she related the cultural considerations of her people regarding data collection. She also spoke about Indigenous Data Sovereignty and its role in ensuring that indigenous populations have control over their own data.

The keynote speaker on the second day was Dr. Amber Budden, Director for Community Engagement and Outreach at DataONE, which is a large initiative made up of multiple collaborators designed to facilitate the sharing of environmental data. Dr. Budden’s presentation detailed the successes and future goals of DataONE.

The majority of the conference consisted of various panel presentations organized into themes, including Partnerships, Data Visualization, Consortia, and Data Privacy. Of particular interest was a panel presentation from Dr. David Fearon, Senior Data Management Consultant at Johns Hopkins University, entitled “Screening for Human Subject Disclosure Risk During Data Curation and RDM Service Connections.” The presentation described the efforts being undertaken by the university in providing various types of support to researchers in the health sciences in order to facilitate the sharing of health-related data.

Other conference activities included lightning rounds and the RDAP Business Meeting. Additionally, a poster session allowed for discussions among speakers and attendees. Opportunities for networking abounded over delicious conference-provided meals with a distinctly New Mexican flair.

Overall, this summit covered numerous aspects of data management as it relates to libraries and librarians. There was something for everyone, from the novice data management librarian to the data management specialist. I would encourage anyone with an interest in this topic to attend future events.

Jessica Koos

Jessica Koos at RDAP

If you’re interested in learning more about RDAP and the annual RDAP summit, please visit rdapassociation.org.

Written by Jessica Koos, Health Sciences Librarian at Stony Brook University.

Categories: Data Science

Stephen Sherry, PhD, Named Acting Director of NLM’s National Center for Biotechnology Information

PSR Data Science - Wed, 2020-04-29 17:13
man in suit and tie smiling at the camera

Stephen Sherry, PhD

National Library of Medicine Director Patti Brennan, RN, PhD, has named Stephen Sherry, PhD, Acting Director of the National Center for Biotechnology Information (NCBI) at the National Library of Medicine effective March 31, 2020. As Acting Director of NCBI, Dr. Sherry oversees a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. He is also responsible for developing and operating all NCBI production services, with program areas spanning literature, sequences, chemistry, clinical research, and medical genetics.

Dr. Sherry also leads an NLM program to migrate NCBI’s largest resource, the Sequence Read Archive, into the cloud with the transfer and management of petabyte-scale sequence data on two commercial cloud platforms. He conducts research on the architecture of population genetic information to ensure human genetic information systems are both useful to researchers and respectful to the privacy of study participants.

Dr. Sherry earned his Ph.D. in Anthropology at the Pennsylvania State University in 1996, and post-doctorate at the Louisiana State University Medical Center prior to joining NLM in 1998.

Categories: Data Science

DataFlash: New and Free Virtual Science Conference in June 2020

PNR Data Science - Thu, 2020-04-23 10:13

We are pleased to announce that New England Science Bootcamp for Librarians will host a FREE virtual conference on June 11, 2020, from 6 A.M. – 1 P.M. PT | 9am – 4pm (US Eastern Time).

Topics will probably include, depending on speaker availability:

  • Vaccine research & manufacture
  • Virology
  • Making Health Devices in non-industrial settings
  • IRB and human subjects research  in the shifting landscape

Tentative Schedule

  • 9-10am                Session A
  • 10:15-10:45      Interstitial – Front Lines Stories
  • 11-12                    Session B
  • 12-1                      Lunch
  • 1-2                         Session C
  • 2:15-2:45            Interstitial – Front Lines Stories
  • 3-4                         Session D

The schedule of topics will be finalized and sent to all registrants soon.

This conference will run throughout the day, with access via a single link for the whole day that attendees will receive after registering. You may tune in as you have time or for the topics that interest you the most. More information and a detailed schedule will be available closer to the date of the event. Please register to get the most updated information sent to you.

Registration is available at https://forms.gle/whfu1H9nY2e8X3gS9

NER looks forward to seeing you online for a fun and informative conference! While NER had to cancel their in-person conference this year, they hope you will take advantage of this free professional development opportunity. While this conference is being organized in New England, they welcome attendance for anyone and anywhere.

Categories: Data Science

DataFlash: New Webinar on Sharing, Discovering, and Citing COVID-19 Data and Code in Generalist Repositories

PNR Data Science - Tue, 2020-04-21 12:47

The National Library of Medicine (NLM) at the National Institutes of Health is hosting a free webinar for researchers to learn how to share, discover, and cite COVID-19 data and code in generalist repositories on Friday, April 24 from 11:00 a.m. – 12:45 p.m. PT.

The biomedical research community’s understanding of the novel coronavirus and the associated coronavirus disease (COVID-19) is rapidly evolving. Open science and the timely sharing of research data have played a critical role in advancing our understanding of COVID-19 and accelerating the pace of discovery.

Researchers will have an opportunity to hear from multiple generalist repositories about the ways each repository is supporting discoverability and reusability of COVID-19 data and associated code. The NLM will also provide an overview of available COVID-19 literature.

The webinar will be available via NIH VideoCast.

Instructions on submitting questions will be made available closer to the webinar. Interested participants are encouraged to bookmark this page for the latest updates and follow #NIHdata on Twitter.  The webinar will be recorded and available a week after the live event.

See the agenda on the ODSS website.

Categories: Data Science

Upcoming Webinar on Sharing, Discovering, and Citing COVID-19 Data and Code in Generalist Repositories

SEA Data Science - Tue, 2020-04-21 12:11

The National Library of Medicine (NLM) at the National Institutes of Health is hosting a free webinar for researchers to learn how to share, discover, and cite COVID-19 data and code in generalist repositories on Friday, April 24 from 2:00-3:45 p.m. ET.

The biomedical research community’s understanding of the novel coronavirus and the associated coronavirus disease (COVID-19) is rapidly evolving. Open science and the timely sharing of research data have played a critical role in advancing our understanding of COVID-19 and accelerating the pace of discovery.

Researchers will have an opportunity to hear from multiple generalist repositories about the ways each repository is supporting discoverability and reusability of COVID-19 data and associated code. The NLM will also provide an overview of available COVID-19 literature.

The webinar will be available via NIH VideoCast.

Instructions on submitting questions will be made available closer to the webinar. Interested participants are encouraged to bookmark this page for the latest updates and follow #NIHdata on Twitter.  The webinar will be recorded and available a week after the live event.

See the agenda on the ODSS website.

To add the attached event to your calendar, double-click the attachment to open, next click the ‘Copy to My Calendar’ button.

Categories: Data Science

NIST and OSTP Launch Effort to Improve Search Engines for COVID-19 Research

PSR Data Science - Mon, 2020-04-20 15:56

The U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) and the White House Office of Science and Technology Policy (OSTP) have just launched a joint effort to support the development of search engines for research that will help in the fight against COVID-19. The project was developed in response to the March 16 White House Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset.

In this effort, NIST will work initially with the Allen Institute for Artificial Intelligence, the National Library of Medicine, Oregon Health & Science University (OHSU), and the University of Texas Health Science Center at Houston (UT Health). The team will apply the successful, long-running program of expert engagement and technology assessment called the Text Retrieval Conference, or TREC, to the COVID-19 Open Research Dataset (CORD-19), a resource of more than 44,000 research articles and related data about COVID-19 and the coronavirus family of viruses. The TREC-COVID program goals include creating datasets and using an independent assessment process that will help search engine developers to evaluate and optimize their systems in meeting the needs of the research and health-care communities.

The team will first release a series of sample queries for the biomedical research community, developed by team members at the National Library of Medicine, OHSU and UT Health. Registered participants in TREC-COVID will use their information retrieval and search systems to run the queries against the CORD-19 document set and return their results to NIST. Biomedical experts will then review test results, including document relevance rankings, to assess the overall performance of the retrieval systems.

Using proven TREC protocols, NIST will score the submissions and post the scores, the retrieval results themselves, and the lists of key reference documents to the TREC-COVID website. These “test collections” can then be used by information retrieval researchers to evaluate and enhance the performance of their own search engines. This effort is intended to help researchers understand how search systems could best support medical researchers when available information is developing quickly, as in the current pandemic.

The Allen Institute for Artificial Intelligence has been releasing an expanded CORD-19 document set each Friday to capture the most recent articles on COVID-19 and related coronaviruses. Later rounds of TREC-COVID will use the larger releases of CORD-19 and expanded query sets. Participants will have one week to submit their search results, and within about a week NIST will post results, with an expected spacing of about two weeks between each new dataset round being released. The team initially anticipates conducting five consecutive rounds of search system assessments. Interested organizations are invited to register to participate in the TREC-COVID program on the NIST website.

Categories: Data Science