National Network of Libraries of Medicine
English Arabic Chinese (Simplified) French Hindi Japanese Korean Persian Portuguese Russian Spanish

Data Science

DataFlash: NIH Requests Public Comment on a Draft Policy for Data Management and Sharing and Supplemental Draft Guidance

PNR Data Science - Fri, 2019-11-08 15:08

On November 6th, 2019, NIH released a Draft NIH Policy for Data Management and Sharing and supplemental draft guidance for public comment. The purpose of this draft policy and supplemental draft guidance is to promote effective and efficient data management and sharing that furthers NIH’s commitment to making the results and accomplishments of the research it funds and conducts available to the public. Complete information about the draft Policy and draft supplemental guidance can be found on the NIH OSP website.

Stakeholder feedback is essential to ensure that any future policy maximizes responsible data sharing, minimizes burden on researchers, and protects the privacy of research participants.  Stakeholders are invited to comment on any aspect of the draft policy, the supplemental draft guidance, or any other considerations relevant to NIH’s data management and sharing policy efforts that NIH should consider.

To facilitate commenting, NIH has established a web portal that can be accessed here. To ensure consideration, comments must be received no later than January 10, 2020.

For additional details about NIH’s thinking on this issue, please see Dr. Carrie Wolinetz’ latest Under the Poliscope blog:

NIH’s DRAFT Data Management and Sharing Policy: We Need to Hear From You!

NIH will also be hosting a webinar on the draft policy in the near future. Please stay tuned for details.

Questions may be sent to SciencePolicy@mail.nih.gov.

Categories: Data Science

NIH Requests Public Comment on a Draft Policy for Data Management and Sharing and Supplemental Draft Guidance

SEA Data Science - Thu, 2019-11-07 10:23

Yesterday, NIH released a Draft NIH Policy for Data Management and Sharing and supplemental draft guidance for public comment. The purpose of this draft policy and supplemental draft guidance is to promote effective and efficient data management and sharing that furthers NIH’s commitment to making the results and accomplishments of the research it funds and conducts available to the public. Complete information about the draft Policy and draft supplemental guidance can be found on the NIH OSP website.

Stakeholder feedback is essential to ensure that any future policy maximizes responsible data sharing, minimizes burden on researchers, and protects the privacy of research participants.  Stakeholders are invited to comment on any aspect of the draft policy, the supplemental draft guidance, or any other considerations relevant to NIH’s data management and sharing policy efforts that NIH should consider.

To facilitate commenting, NIH has established a web portal that can be accessed here. To ensure consideration, comments must be received no later than January 10, 2020.

For additional details about NIH’s thinking on this issue, please see Dr. Carrie Wolinetz’ latest Under the Poliscope blog:

NIH’s DRAFT Data Management and Sharing Policy: We Need to Hear From You!

NIH will also be hosting a webinar on the draft policy in the near future. Please stay tuned for details.

Questions may be sent to SciencePolicy@mail.nih.gov.

Categories: Data Science

UC Libraries and IT@UC Partner to Bring Renowned NLM Data Scientist to Campus

GMR Data Science - Tue, 2019-11-05 16:18

By Melissa Previtera and Don Jason

On September 17th and 18th, Dr. Lisa Federer, Data and Open Science Librarian for the National Library of Medicine (NLM), visited the University of Cincinnati as part of the Data and Computational Science Series (DCS2).

During her visit, Dr. Federer shared her expertise in the field of biomedical research data and data visualization through a lecture, a hands-on workshop, and meetings with various data and informatics leaders.

Dr. Federer’s lecture “If You Share it, Will They Come? Quantifying and Characterizing Reuse of Biomedical Research Data” encouraged individuals to think about how they are not only sharing and reusing data but how patterns of reuse can influence curation and preservation. She presented her talk in the Henry R. Winkler Center for the History of the Health Profession’s Stanley J. Lucas, MD Board Room. Dr. Federer hosted a luncheon at the same venue. During the luncheon, she answered questions about her lecture and had in-depth conversations with UC faculty and researchers.

Lisa Federer engaged with participant learning

After the luncheon, Dr. Federer taught a hands-on workshop titled “Endless Forms Most Beautiful: Creating Customized Data Visualization with ggplot2 in R.” The workshop was held in the Donald C. Harrison Health Sciences Library’s Dr. Stanley B. Troup Learning Space. The workshop addressed the importance of clear communication, effective visualizations, and accessibility for colorblind individuals. Dr. Richard Johansen, Data Visualization Specialist, and Mark Chalmers, Science Librarian, served as teaching assistants for the workshop.

During Dr. Federer’s time in Cincinnati, she served as the keynote speaker for the Cincinnati Area Health Sciences Library Association’s (CAHSLA) annual meeting and served as a guest speaker at the UC Libraries Faculty Meeting. She presented a talk titled “Beyond the Data Management Plan: Expanding Roles for Librarians” to both audiences. This talk gave a synopsis of emerging data science competencies for the library workforce. The talk provided a roadmap of trainings, webinars and classes librarians could complete in order to gain these professional skills.

Dr. Federer’s visit was a huge success – bringing together attendees from a variety of academic disciplines and research interests. The DCS2 planning committee hopes Dr. Federer’s visit starts conversations, expands professional networks and is the catalyst for future collaborations.

Don Jason, Health Informationist, served as site coordinator for Dr. Federer’s visit. He received logistical support from Melissa Previtera, HSL/ Winkler Center Term Librarian, Assami Semde, HSL Circulation Desk Coordinator, and Lori Harris, Interim Director of the Health Sciences Library.

The DCS2 planning committee would like to thank Dr. Federer for sharing her extensive knowledge and skills with the UC community. The committee would also like to thank UC Libraries’ Research & Data Services, the UC Digital Scholarship Center, and the UC Institute for Interdisciplinary Data Science, for meeting with Dr. Federer during her visit. Finally, we would like to extend our sincerest gratitude to the Office of the Provost for funding the DCS2.

The DCS2 is a collaboration between UC Libraries and IT@UC. The series provides the UC research community with innovative workshops and distinguished speakers on advanced research data topics. Please visit the DCS2 Website to register for upcoming lectures and training sessions.
___________________________________________________________________________________________________________________
The NNLM is grateful for the outreach and engagement work of our NNLM members. If you have a program or project to share, please email us at gmr-lib@uiowa.edu.

Categories: Data Science

National Medical Librarians Month: Sara Mannheimer

PNR Data Science - Mon, 2019-10-07 08:05

In honor of National Medical Librarians Month in October, we are featuring librarians in the PNR region who are medical/health sciences librarians as well as those who provide health information to their communities.  This week of October 7th, 2019 we are featuring Montana State University’s Sara Mannheimer who is a Data Librarian.  Welcome Sara, to the PNR Dragonfly blog!

BioSketch:

  • Name: Sara Mannheimer
  • Position: Data Librarian
  • Working organization: Montana State University
  • Education history
    • BA in Literature from Bard College
    • MS in Information Science from University of North Carolina at Chapel Hill
  • Personal Background
    • Sara takes ballet and modern dance classes and she performed in a local dance showcase last month. Sara also play piano and guitar (but she only performs for her partner and her cat!).  Sara was born and raised in Anchorage, Alaska, where she worked as a sea kayak guide in Alaska and the US Virgin Islands in her 20’s, and she still loves being outside—bike commuting, backpacking, camping, and cross-country skiing. Sara is also an enthusiastic extrovert and a believer in the power of community, so spending time with friends is one of her biggest sources of joy.

Interview:

Q1: It’s an honor to have you with us on the Dragonfly Blog -welcome Sara! My first question is related to the theme of medical librarianship as October is National Medical Librarians month.  So, what inspired you to work with medical data?

Thank you! It’s a pleasure to be featured! My work with data began in graduate school at UNC-Chapel Hill, where I studied archives and records management. I got into the world of data archiving through an independent study developing a digital preservation policy for Dryad Digital Repository. During the project, I had invaluable mentorship from Ayoung Yoon (who is now on the iSchool faculty at IUPUI) and Jane Greenberg (now on the iSchool faculty at Drexel). Ayoung was a PhD student at the time, and she collaborated with me on a poster that we presented at the ASIS&T annual meeting. Jane instilled in me a love for metadata and encouraged me to apply to be the Senior Curator at Dryad after I finished my master’s degree. Jane and Ayoung also mentored me by co-authoring a paper describing our digital preservation policy development process. Building on the work I did at Dryad, I decided to move to a tenure track faculty position as Data Librarian at Montana State University (MSU). At MSU, I help with data management planning, coordinate data science workshops, build data-related tools, and conduct research exploring data curation and data ethics.

Working with NNLM-PNR has been a great entrance into medical data. For example, NNLM-PNR just funded a project that will allow me and my colleagues Jason Clark and Jim Espeland to work with a research center on campus to make their restricted health sciences data available to community partners.

Q2:  Tell me, how did you get into data science?

I’m still getting into it! I began my learning process through a couple of Data Carpentries workshops—one at the Research Data Access and Preservation (RDAP) Summit in 2015, and one at the National Data Integrity Conference in 2017, and then I trained to be a certified Carpentries instructor last year. But most of the data science instruction in the library is the result of collaborations across campus. I’m partnering with Allison Theobold, a graduate student in the statistics program who teaches workshops as part of her dissertation project. She and her advisor, Stacey Hancock, have helped create a thriving R workshop series in the library that includes introductory and intermediate R concepts, as well as sessions on data wrangling and data visualization. This year, we’ve extended the partnership to include graduate students from MSU’s Statistical Consulting and Research Services in order to continue to sustain the workshops. These statistics graduate students have strong coding skills, and they are amazing teachers for their peers.

In addition to teaching practical coding skills, I have an interest in big data ethics, and I have done some writing and thinking about the ramifications of data science using social media data. And I have also begun to pursue projects that support “collections as data”—that is, computational analysis for digital collections. This work includes initiatives like making the text of our digital archival collections available for download, and mentoring students to create digital scholarship projects using archival collections. This interactive map created by former MSU student Dillon Monday is a good example of a collections as data project.

Q3:  In your time as Montana State University’s Data Librarian, what has been your most favorite project to date?

I think my favorite project is actually the first grant I was awarded from NNLM-PNR in 2017! The project took an evidence-based approach to creating a data management planning toolkit aimed at health sciences researchers. After identifying a need to improve the data management planning resources that the library provides to the campus community, I proposed a grant to analyze data management plans from grant proposals at MSU, and then to interview principal investigators about their data management practices.

The research I conducted (with fantastic student research assistant Wangmo Tenzing) showed that most investigators practice internal data management in order to prevent data loss, to facilitate sharing within the research team, and to seamlessly continue their research during personnel turnover. However, it also showed that investigators still have room to grow in understanding specialized concepts like metadata and policies for reuse. I used the research results to inform a data management planning toolkit that includes guidance on facilitating findable, reusable, accessible, and reusable data—for example, using metadata standards, assigning licenses to their data, and publishing in data repositories. If you want to read more, I’ve published a talk  and a paper about the project.

Q4:  Are you working on anything new and exciting that you would like to share with us?

I’m getting my PhD right now from Humboldt University in Berlin (with advisor Vivien Petras), and my dissertation is a comparative study of qualitative secondary analysis and social media research. I’m still early in the process, but I’m loving the opportunity to take a deep dive into the topic of qualitative and social media data sharing.

Q5:  To date, what is your favorite data tool?

I’m really enjoying becoming more literate in R. We use RStudio Cloud in our workshops, and it simplifies the setup process for learners. I’m also keeping an eye on the development of Annotation for Transparent Inquiry (ATI), an annotation tool for qualitative research that’s being developed at the Qualitative Data Repository.

Q6:  If you could give one piece of advice/words of wisdom to anyone interested in medical librarianship/data science what would that be?

Collaborate. Our library and academic communities are vibrant and varied, and I’ve done my most impactful work when partnering with colleagues and students. Data librarianship overlaps and connects with many fields, and it’s impossible to have expertise in everything. Working with collaborators allows me to extend my own knowledge, develop better ideas, and provide stronger data services on campus.

Categories: Data Science

Register Today! Finding Clinically Relevant Genetic Information

MCR Data Science - Fri, 2019-09-13 15:21

NNLM Resource Picks: Finding Clinically Relevant Genetic Information

Date: Wednesday, September 25, 2019
Time: 3:00pm, Eastern Standard Time

Register

Did you know genetic testing is available for over 2,000 rare and common conditions, and that over 500 laboratories conduct genetic testing? Clinicians using genetic diagnostics rely on variant classifications to arrive at a diagnosis, decide on interventions, and evaluate care. But how do you locate this information?

Peter Cooper, NCBI Staff Scientist, will join us live to introduce three resources for finding clinically relevant genetic information. In this session, he will:

  1. Explain the validity of clinical variation information in the ClinVar database.
  2. Locate information about a genetic condition related to a specific list of symptoms using MedGen.
  3. Locate tests for a clinical feature, gene or disease using the Genetic Testing Registry.

MLA CE credit is available for those who register and complete the evaluation form!

NNLM Resource Picks is a collaborative, bimonthly, webcast series featuring the National Library of Medicine resources to increase awareness of these resources as well as encourage their integration by libraries and other organizations to more fully serve their colleagues and communities. All sessions are recorded with closed captioning and archived. Click here for upcoming and past NNLM Resource Picks sessions.

Contact: Dana Abbey at dana.abbey@cuanschutz.edu.

Categories: Data Science

NNLM’s RD3 Website: RDM 101 Course Material Available for Non-Course Registrants

PNR Data Science - Thu, 2019-09-12 11:57

RD3 Website Intro

The NNLM’s research data management (RDM) course entitled, “RDM 101” kicked off this past Monday, September 9th, 2019 with a full class; interest in this particular RDM course was so high that it even gave rise to a course waitlist!

RDM 101 is an excellent and comprehensive course on RDM basics.  It covers topics that are relevant to the supporting RDM librarian, who needs to help anyone in research that needs a hand with managing and organizing data.  More specifically, it covers these key data science topics:

  • Data organization
    (i.e. data collection, data documentation like file naming etc., data types, metadata format and standards for metadata content like controlled vocabularies, and data management plan (DMPs) design);
  • Data storage and security
    (i.e. short-term backup and long-term storage options, encryption, password protection etc.);
  • Data access and sharing, and reuse
    (i.e. copyright and intellectual property issues, data use agreements, data sharing funder requirements, licenses for data usage etc.) and;
  • Data preservation
    (i.e. various data repositories – subject specific, general, and institutional – and data journals).

For the busy librarian who may not have the time commitment that is required and involved to participate in this RDM 101 course, or for the librarian who couldn’t get into the Moodle course, there is hope!!!  All of the RDM 101 course material except active links to the course readings and assignments/pretest/posttest material is up and running on the NNLM’s RD3 website.

The NNLM’s RD3 website is the answer to your data science questions.  It is an excellent and comprehensive website about data science and includes a page under “Training” for RDM training from the RDM 101 course.  It is organized by week and there are 5 weeks in total.

Something to look forward to in the next coming weeks is RDM 102’s course material will be posted on the NNLM’s RD3 website too!  Soooo, stay tuned!!!

Categories: Data Science

Upcoming Research Data Management Webinar: If You Share It, Will They Come? Quantifying and Characterizing Reuse of Biomedical Research Data

SEA Data Science - Wed, 2019-09-04 10:04

Date: Wednesday, October 2nd

Time: 2:00PM – 3:00PM ET

Presenter: Lisa Federer, PhD, MLIS is the Data Science and Open Science Librarian at the National Library of Medicine (NLM), focusing on developing efforts to support workforce development and enhance capacity in the biomedical research and library communities for data science and open science. Prior to joining NLM, Lisa spent five years as the Research Data Informationist at the National Institutes of Health Library, where she developed and ran the Library’s Data Services Program. She holds a PhD in information studies from the University of Maryland and an MLIS from the University of California-Los Angeles, as well as graduate certificates in data science and data visualization. Her research focuses on quantifying and characterizing biomedical data reuse and development of meaningful scholarly metrics for shared data.

Description: Since the mid-2000s, new data sharing mandates have led to an increase in the amount of research data available for reuse. Reuse of data benefits the scientific community and the public by potentially speeding scientific discovery and increasing the return on investment of publicly funded research. However, despite the potential benefits of reuse and the increasing availability of data, research on the impact of data reuse is so far sparse. This talk will provide a deeper understanding of the impacts of shared biomedical research data by answering the question “what happens with datasets once they are shared?”

Specifically, this talk will demonstrate that data are often reused in very different contexts than for which they were originally collected, as well as explore how patterns of reuse differ between dataset types. This talk also considers patterns of data reuse over time and the topics of the most highly reused datasets to determine whether it is possible to predict which datasets will go on to be highly reused over time. Finally, career stage and geographic location of data reusers provide an understanding of who benefits from shared research data. These findings have implications for several stakeholders, including researchers who share data and those who reuse it, funders and institutions developing policies to reward and incentivize data sharing, and repositories and data curators who must make choices about which datasets to curate and preserve

Registration: Registration is free and can be accessed through the NNLM class instance.

For additional information, please contact Kiri Burcat.

Categories: Data Science

DataFlash: Stephen Few’s “The Data Loom” – A Book Review

PNR Data Science - Tue, 2019-08-27 16:26

Books with reading glasses

Stephen Few is no amateur when it comes to data analysis and data visualization; as the author of more than half a dozen books on data analysis and data visualization, this Pacific Northwest resident has become a trusted expert on the topic.

In Few’s newest book which was released this past May 2019 entitled “The Data Loom”, he does not disappoint his growing data fans.  In a time where dressing up data stories with cheap tricks (i.e. useless and misleading data visualizations to suit your own objectives) has become popular, Few reminds us of the importance of truthful data storytelling and truthful data presentations.  Few teaches us how to think critically and scientifically when it comes to thinking about our data and data presentation.  In fact, Few asserts that we don’t really live in the “Information Age” but more of the “Data Age” where data only is valuable to us after we make sense of it – i.e. through data sensemaking.

In Chapter 3 entitled “Think Scientifically”, Few reflects on the greater purpose of data sensemaking (63):

“Too often, data sensemaking focuses solely on collecting and reporting facts. However, facts are only useful if they lead to an understanding that enables decisions and actions that produce a better world.  Not every question involves causal relationships, but the most important questions do.”

Through being able to think critically and scientifically, we are in a better position to really understand and use data in a truthful and valuable way that will ultimately affect our ability to make good decisions.  Few’s knowledge of critical and scientific thinking comes shining through with many of his inspirational quotes and book references from great thinkers.  Masterfully, Stephen Few succinctly sums up a huge body of essential statistical, philosophical, and scientific works into a matter of 122 pages.  “The Data Loom” by Stephen Few is an amazingly concise work on thinking about data and a very worthwhile read!!!

Additional Reading by Stephen Few:

Show Me the Numbers

Information Dashboard Design: Displaying Data for At-a-Glance Monitoring

Categories: Data Science

NIH All of Us Research Program Plans Genome Sequencing and Genetic Counseling for Participants

PSR Data Science - Wed, 2019-08-21 18:03

The NIH All of Us Research Program has awarded $4.6 million in initial funding to Color, a health technology company in Burlingame, CA, to establish the program’s nationwide genetic counseling resource. With the goal of speeding up health research breakthroughs, All of Us plans to sequence the genomes of 1 million participants from diverse communities across the United States. Through this funding, Color’s network of genetic counselors will help participants understand what the genomic testing results mean for their health and their families. As one of the most ambitious research programs in history, the All of Us Research Program aims to create the largest and most diverse health research resource of its kind. Participants from all parts of the country share health information over time through surveys, electronic health records and more. Some participants also are invited to contribute blood and urine samples for analysis. Researchers will be able to use this data to learn more about how biology, behavior, and environment influence health and disease, which may lead to discoveries on how to further individualize health care in the future.

Over time, the program anticipates providing several kinds of information to participants, including: information on ancestry and traits, drug-gene interactions (pharmacogenomics) and genetic findings connected with high risk of certain diseases. Genomic results from All of Us, although produced at a high quality in specially certified labs, should be confirmed by a health care provider before a participant makes any changes to their care. The pharmacogenomic information may help participants work with their health care teams more effectively to make choices about certain prescription drugs. Genetic findings tied to 59 genes associated with risk of specific diseases, like breast cancer or heart disease, for which there are established medical guidelines for treatment or prevention will also be returned to participants. To ensure that the program uses the most current knowledge in the fast-moving field of clinical genetics, All of Us is following guidance from professional organizations such as the American College of Medical Genetics and Genomics and the Clinical Pharmacogenetics Implementation Consortium.

As health-related information is made available, all participants will have access to genetic counseling services from Color. A small percentage of people will have DNA results, such as a variation in the breast cancer gene BRCA1, that may be important for treatment or screening. This information can also be valuable to their immediate family members who may share the same genetic variant. For All of Us, that could amount to tens of thousands of participants out of its eventual 1 million. Color will deliver the results to these participants in genetic counseling sessions, highlighting any important findings they may want to discuss with a health care provider.

Color will offer educational materials and telecounseling in multiple languages, as well as access to in-house licensed clinical pharmacists who can help participants have more effective conversations with their health care providers. Genetic counselors will also be able to help connect participants to health care providers who can address their particular health risks. To help guide its genetic counseling services, Color’s steering committee is led by Amy Sturm, M.S., CGC, LGC, president of the National Society of Genetic Counselors. It also includes leadership of the American Board of Genetic Counseling. The steering committee will help ensure that Color delivers top-quality genetic counseling and serves as a platform for training future generations of genetic counselors. Color has built software and digital tools that remove traditional barriers to genetic counseling and clinical genetic testing. It has conducted more than 15,000 genetic counseling sessions to help people across the country understand their DNA information. For an overview of the outputs that Color will provide, watch this 90-second YouTube video featuring Eric Dishman, Director of the NIH All of Us Research Program.

Categories: Data Science