PNR Data Science
Applications Open for RDM 102: Beyond Research Data Management for Biomedical and Health Sciences Librarians (Spring 2020) Biomedical and health sciences librarians are invited to participate in a rigorous online training course going beyond the basics of research data management, sponsored by the National Network of Libraries of Medicine Training Office (NTO). This course will expand on concepts covered in RDM 101: Biomedical and Health Research Data Management Training for Librarians, and threaded throughout will be the librarian’s role in research reproducibility and research integrity and include practice in using Jupyter Notebooks. The course topics include an overview of data science and open science, data literacy, data wrangling, data visualization, and data storytelling.
The program spans 9 weeks from February 24 – April 24, including 5 modules of asynchronous content, a catch-up week, and a synchronous online session during the week of April 20. The format includes video lectures, readings, case studies, hands-on exercises, and peer discussions. Under the guidance of a project instructor, participants will complete a Final Project to demonstrate improved skills, knowledge, and ability to support data science services at their institution. Expect to spend about 6 hours each week on coursework and the project.
Applications are due January 10, 2020.
For questions and concerns, please contact the NTO at email@example.com
DataFlash: NIH Requests Public Comment on a Draft Policy for Data Management and Sharing and Supplemental Draft Guidance
On November 6th, 2019, NIH released a Draft NIH Policy for Data Management and Sharing and supplemental draft guidance for public comment. The purpose of this draft policy and supplemental draft guidance is to promote effective and efficient data management and sharing that furthers NIH’s commitment to making the results and accomplishments of the research it funds and conducts available to the public. Complete information about the draft Policy and draft supplemental guidance can be found on the NIH OSP website.
Stakeholder feedback is essential to ensure that any future policy maximizes responsible data sharing, minimizes burden on researchers, and protects the privacy of research participants. Stakeholders are invited to comment on any aspect of the draft policy, the supplemental draft guidance, or any other considerations relevant to NIH’s data management and sharing policy efforts that NIH should consider.
To facilitate commenting, NIH has established a web portal that can be accessed here. To ensure consideration, comments must be received no later than January 10, 2020.
For additional details about NIH’s thinking on this issue, please see Dr. Carrie Wolinetz’ latest Under the Poliscope blog:
NIH will also be hosting a webinar on the draft policy in the near future. Please stay tuned for details.
Questions may be sent to SciencePolicy@mail.nih.gov.
In honor of National Medical Librarians Month in October, we are featuring librarians in the PNR region who are medical/health sciences librarians as well as those who provide health information to their communities. This week of October 7th, 2019 we are featuring Montana State University’s Sara Mannheimer who is a Data Librarian. Welcome Sara, to the PNR Dragonfly blog!
- Name: Sara Mannheimer
- Position: Data Librarian
- Working organization: Montana State University
- Education history
- BA in Literature from Bard College
- MS in Information Science from University of North Carolina at Chapel Hill
- Personal Background
- Sara takes ballet and modern dance classes and she performed in a local dance showcase last month. Sara also play piano and guitar (but she only performs for her partner and her cat!). Sara was born and raised in Anchorage, Alaska, where she worked as a sea kayak guide in Alaska and the US Virgin Islands in her 20’s, and she still loves being outside—bike commuting, backpacking, camping, and cross-country skiing. Sara is also an enthusiastic extrovert and a believer in the power of community, so spending time with friends is one of her biggest sources of joy.
Q1: It’s an honor to have you with us on the Dragonfly Blog -welcome Sara! My first question is related to the theme of medical librarianship as October is National Medical Librarians month. So, what inspired you to work with medical data?
Thank you! It’s a pleasure to be featured! My work with data began in graduate school at UNC-Chapel Hill, where I studied archives and records management. I got into the world of data archiving through an independent study developing a digital preservation policy for Dryad Digital Repository. During the project, I had invaluable mentorship from Ayoung Yoon (who is now on the iSchool faculty at IUPUI) and Jane Greenberg (now on the iSchool faculty at Drexel). Ayoung was a PhD student at the time, and she collaborated with me on a poster that we presented at the ASIS&T annual meeting. Jane instilled in me a love for metadata and encouraged me to apply to be the Senior Curator at Dryad after I finished my master’s degree. Jane and Ayoung also mentored me by co-authoring a paper describing our digital preservation policy development process. Building on the work I did at Dryad, I decided to move to a tenure track faculty position as Data Librarian at Montana State University (MSU). At MSU, I help with data management planning, coordinate data science workshops, build data-related tools, and conduct research exploring data curation and data ethics.
Working with NNLM-PNR has been a great entrance into medical data. For example, NNLM-PNR just funded a project that will allow me and my colleagues Jason Clark and Jim Espeland to work with a research center on campus to make their restricted health sciences data available to community partners.
Q2: Tell me, how did you get into data science?
I’m still getting into it! I began my learning process through a couple of Data Carpentries workshops—one at the Research Data Access and Preservation (RDAP) Summit in 2015, and one at the National Data Integrity Conference in 2017, and then I trained to be a certified Carpentries instructor last year. But most of the data science instruction in the library is the result of collaborations across campus. I’m partnering with Allison Theobold, a graduate student in the statistics program who teaches workshops as part of her dissertation project. She and her advisor, Stacey Hancock, have helped create a thriving R workshop series in the library that includes introductory and intermediate R concepts, as well as sessions on data wrangling and data visualization. This year, we’ve extended the partnership to include graduate students from MSU’s Statistical Consulting and Research Services in order to continue to sustain the workshops. These statistics graduate students have strong coding skills, and they are amazing teachers for their peers.
In addition to teaching practical coding skills, I have an interest in big data ethics, and I have done some writing and thinking about the ramifications of data science using social media data. And I have also begun to pursue projects that support “collections as data”—that is, computational analysis for digital collections. This work includes initiatives like making the text of our digital archival collections available for download, and mentoring students to create digital scholarship projects using archival collections. This interactive map created by former MSU student Dillon Monday is a good example of a collections as data project.
Q3: In your time as Montana State University’s Data Librarian, what has been your most favorite project to date?
I think my favorite project is actually the first grant I was awarded from NNLM-PNR in 2017! The project took an evidence-based approach to creating a data management planning toolkit aimed at health sciences researchers. After identifying a need to improve the data management planning resources that the library provides to the campus community, I proposed a grant to analyze data management plans from grant proposals at MSU, and then to interview principal investigators about their data management practices.
The research I conducted (with fantastic student research assistant Wangmo Tenzing) showed that most investigators practice internal data management in order to prevent data loss, to facilitate sharing within the research team, and to seamlessly continue their research during personnel turnover. However, it also showed that investigators still have room to grow in understanding specialized concepts like metadata and policies for reuse. I used the research results to inform a data management planning toolkit that includes guidance on facilitating findable, reusable, accessible, and reusable data—for example, using metadata standards, assigning licenses to their data, and publishing in data repositories. If you want to read more, I’ve published a talk and a paper about the project.
Q4: Are you working on anything new and exciting that you would like to share with us?
I’m getting my PhD right now from Humboldt University in Berlin (with advisor Vivien Petras), and my dissertation is a comparative study of qualitative secondary analysis and social media research. I’m still early in the process, but I’m loving the opportunity to take a deep dive into the topic of qualitative and social media data sharing.
Q5: To date, what is your favorite data tool?
I’m really enjoying becoming more literate in R. We use RStudio Cloud in our workshops, and it simplifies the setup process for learners. I’m also keeping an eye on the development of Annotation for Transparent Inquiry (ATI), an annotation tool for qualitative research that’s being developed at the Qualitative Data Repository.
Q6: If you could give one piece of advice/words of wisdom to anyone interested in medical librarianship/data science what would that be?
Collaborate. Our library and academic communities are vibrant and varied, and I’ve done my most impactful work when partnering with colleagues and students. Data librarianship overlaps and connects with many fields, and it’s impossible to have expertise in everything. Working with collaborators allows me to extend my own knowledge, develop better ideas, and provide stronger data services on campus.
The NNLM’s research data management (RDM) course entitled, “RDM 101” kicked off this past Monday, September 9th, 2019 with a full class; interest in this particular RDM course was so high that it even gave rise to a course waitlist!
RDM 101 is an excellent and comprehensive course on RDM basics. It covers topics that are relevant to the supporting RDM librarian, who needs to help anyone in research that needs a hand with managing and organizing data. More specifically, it covers these key data science topics:
- Data organization
(i.e. data collection, data documentation like file naming etc., data types, metadata format and standards for metadata content like controlled vocabularies, and data management plan (DMPs) design);
- Data storage and security
(i.e. short-term backup and long-term storage options, encryption, password protection etc.);
- Data access and sharing, and reuse
(i.e. copyright and intellectual property issues, data use agreements, data sharing funder requirements, licenses for data usage etc.) and;
- Data preservation
(i.e. various data repositories – subject specific, general, and institutional – and data journals).
For the busy librarian who may not have the time commitment that is required and involved to participate in this RDM 101 course, or for the librarian who couldn’t get into the Moodle course, there is hope!!! All of the RDM 101 course material except active links to the course readings and assignments/pretest/posttest material is up and running on the NNLM’s RD3 website.
The NNLM’s RD3 website is the answer to your data science questions. It is an excellent and comprehensive website about data science and includes a page under “Training” for RDM training from the RDM 101 course. It is organized by week and there are 5 weeks in total.
Something to look forward to in the next coming weeks is RDM 102’s course material will be posted on the NNLM’s RD3 website too! Soooo, stay tuned!!!
Stephen Few is no amateur when it comes to data analysis and data visualization; as the author of more than half a dozen books on data analysis and data visualization, this Pacific Northwest resident has become a trusted expert on the topic.
In Few’s newest book which was released this past May 2019 entitled “The Data Loom”, he does not disappoint his growing data fans. In a time where dressing up data stories with cheap tricks (i.e. useless and misleading data visualizations to suit your own objectives) has become popular, Few reminds us of the importance of truthful data storytelling and truthful data presentations. Few teaches us how to think critically and scientifically when it comes to thinking about our data and data presentation. In fact, Few asserts that we don’t really live in the “Information Age” but more of the “Data Age” where data only is valuable to us after we make sense of it – i.e. through data sensemaking.
In Chapter 3 entitled “Think Scientifically”, Few reflects on the greater purpose of data sensemaking (63):
“Too often, data sensemaking focuses solely on collecting and reporting facts. However, facts are only useful if they lead to an understanding that enables decisions and actions that produce a better world. Not every question involves causal relationships, but the most important questions do.”
Through being able to think critically and scientifically, we are in a better position to really understand and use data in a truthful and valuable way that will ultimately affect our ability to make good decisions. Few’s knowledge of critical and scientific thinking comes shining through with many of his inspirational quotes and book references from great thinkers. Masterfully, Stephen Few succinctly sums up a huge body of essential statistical, philosophical, and scientific works into a matter of 122 pages. “The Data Loom” by Stephen Few is an amazingly concise work on thinking about data and a very worthwhile read!!!
Additional Reading by Stephen Few:
Show Me the Numbers
Information Dashboard Design: Displaying Data for At-a-Glance Monitoring