National Network of Libraries of Medicine
English Arabic Chinese (Simplified) French Hindi Japanese Korean Persian Portuguese Russian Spanish

MCR Data Science

Subscribe to MCR Data Science feed
Serving the biomedical information needs of Colorado, Kansas, Missouri, Nebraska, Utah, and Wyoming.
Updated: 4 hours 43 min ago

Upcoming webinar: Planning, Developing, and Evaluating R Curriculum at the NIH Library

Wed, 2018-09-26 16:32

Join NNLM for the next iteration of the Research Data Management webinar series: Planning, Developing, and Evaluating R Curriculum at the NIH Library October 12, from 2-3 pm ET. To register for this free webinar, visit: Can’t make it on the 12th? Don’t worry, the webinar will be recorded!

This webinar will describe a pilot project to evaluate current R training at the NIH Library, and based on an evaluation of the data, revise the library’s R training curriculum. This will include a discussion of the development of a training plan, weekly R check-in sessions, managing documents using Open Science Framework (OSF), and an evaluation of the pilot.

Learning Objectives:
By the end of this webinar participants should have a better understanding of:
1. R curriculum before the pilot project
2. Our evaluation of data-related training before the pilot project
3. The components of the pilot project
4. The development of our training plan
5. How OSF was used for project management
6. Format and frequency of classes during the pilot project
7. Post-pilot evaluation

Instructor Bios:
Doug Joubert joined the National Institutes of Health (NIH) Library in 2004. He is a customer-oriented practitioner with extensive experience in providing comprehensive research and information services support to researchers working in the areas of public health and health care policy. In this role, Doug provides his clients with services that support of the missions of the NIH and select HHS staff divisions. As part of his duties at the NIH Library, he identifies and provides guidance on the effective use of emerging technologies and recommends strategies to capitalize on them. Practice areas include data analytics, data visualization, GIS, and teaching.

Candace Norton joined the National Institutes of Health (NIH) Library as a National Library of Medicine (NLM) second year Associate Fellow in 2017. Prior to joining the NLM Associate Fellowship Program, Candace managed a small corporate library for a pharmaceutical and life sciences consulting company in Bethesda, MD. During her fellowship appointment, she has pursued projects and training in areas related to pharmacovigilance monitoring, systematic reviews, bibliometric analysis, and data visualization.


Categories: Data Science

Creating our own pathway

Tue, 2018-09-25 16:33

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course, to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Kristin Whitehair, Director of Library Services, St. Luke’s Health System

During the rise of evidence-based medicine, there was a clear link to the health sciences library.  With evidence, usually as published in the literature, creating the foundation of practice, the library was a natural partner for clinicians, researchers, administrators, and students.

Now with the growth of big data and data science, we are seeing a similar transition.  Organizations are devoting significant resources and energy to data science initiatives.  The potential of data science appears huge, and largely untapped.  The potential of data science has mostly focused on health research.  However, data science can also look internally within the organization, especially for larger health systems.  For example, retail corporations study online customer behavior, product offerings, and facility design.  This same potential holds true for the health sector.  Libraries can support data science in both health research and for organizations internally seeking to optimize business operations.

While there was a clear path to the library with evidence-based medicine, with data science librarians must build their own pathway.  Part of this lies in how we define ourselves.  Libraries can be a collection of literature, physical space, research expertise, and so much more.  In general, libraries avoid limited definitions of their function.  My library is a digital library, and I stress that we are a service with 24/7 access.  This is an attempt to combat the stereotype of libraries as a room with books.  By thoughtfully identifying our function and mission we can position libraries to take advantage of new opportunities such as supporting the organization’s data science initiatives, and whatever else may come next.

Additionally, libraries can provide resources to support data science initiatives.  Some ideas that come to mind are coordinating coding boot camps and organizing regular interest group meetings.  Throughout my career I’ve witnessed how the library can bring people with similar interests from different disciplines together.  Public health researchers may be encountering the same technical problems as biostatisticians.  The library can provide a forum for them to connect.  All of these can be done by the library connecting people with similar interests.

Moreover, library staff can also develop knowledge and skills in the data science field.  Broadly, there are two types of knowledge.  First, there is definitional knowledge, to have an understanding of the meaning of terms.  This is similar to a librarian having a broad understanding of cardiac terminology to better help cardiovascular researchers find information.  Secondly, there is functional knowledge needed to perform data science tasks.  This can focus on hands-on experience with data sets and popular data analysis programming languages.  Over the course of the “Big Data In Healthcare” class we’ve seen several examples of both types of knowledge.

Building strong relationships throughout the organization is the key to creating services and developing skills that meet the organization’s needs.  In general, library services are not “one size fits all.”  It only makes sense that library services supporting data science would also not be.  Strong organizational relationships are important to knowing what the key challenges and opportunities are for your organization, and are the key to ensuring that the library is best serving stakeholders.

In library efforts with data science, it is important to acknowledge where a library may not be a good fit.  This depends on individual staff skills and attitudes.  Much data science work is done using command line programming, which can be challenging to some.  Personally, I have a strong grasp of descriptive statistics, but my knowledge of calculus is lacking.  This creates a notable knowledge gap in the supporting data science. I need to know my limits in interpreting models.  This is not a unique situation for libraries, as it is similar to when a library staff member is asked for medical or legal advice.  We can provide information, but if lacking the appropriate qualifications should be careful when we offer an interpretation.

Overall, the growth in data science is an opportunity for health care in general, and health sciences libraries.  We can all create our own path supporting these initiatives that is the best fit for our individual organizations.

Categories: Data Science

Librarians and Research Data Management Services: Branching Out Into Big Data

Tue, 2018-09-18 10:47

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course, to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Rose Fredrick, Digital Repository Librarian, Health Sciences Library, Creighton University

Big data has a different nature than traditional research data. It is more immediate and ephemeral which creates large, eclectic datasets that are not easily categorized or managed with traditional data science tools.  It is changing the way research is done and the health sciences in particular are discovering new possibilities for studies by aggregating multiple sources of patient data, like wearable health trackers and electronic health records. These transformative studies also give health science librarians an opportunity to support data scientists by building upon existing research data management services.  The librarian’s role in research data management is well-established and this creates a natural launching point for librarians to expand into big data research services.

Many libraries already provide a full array of data services, such as advising on data management plans, metadata and organization, public access mandates, data security, and the preservation and archival of data sets.  Although big data has different needs when it comes to storage and analysis, many of the same services apply.  Librarians have expertise in the ethical implications of data privacy, publisher and funder requirements, and in curating, organizing and preserving data.  All of these skills and services can benefit big data researchers, but librarians do need to be aware of the challenges of big data.

While the knowledge base of librarianship and research data management can clearly be used advantageously for big data services, there can be barriers to librarians implementing these new services.  Perhaps the biggest barrier is training. Depending on the services being offered, at a minimum librarians will need to become familiar with the nature of big data and how that shapes the research process, the correct terminology, and what resources are available to researchers.  Furthermore, to offer the most robust services, librarians may need data science training or advanced technical training to assist with data processing. Not all institutions are prepared to train librarians so extensively nor will they experience enough demand to require a full-time data science librarian .

Librarians can offer more basic services without intensive data science and technical training, however.  A first step could be to become familiar with the terminology, issues, and processes of using big data and be ready to refer researchers with questions to useful resources.  Another option that requires a bit more investment is to offer instruction on crafting data management plans, understanding funder/publisher requirements for data, or choosing a data preservation platform.  Librarians with more time could offer one-on-one advisory sessions on the data management plan for their research projects.  Librarians without a data science background could also take advantage of training geared towards them, like the Data and Visualization Institute for Librarians or the Data Sciences in Libraries Project.

Additionally, as a digital repository librarian, I wanted to determine whether my library would be able to offer services for archiving big data.  Currently, our institutional repository would not be able to house such large sets of data, so while we can advise researchers on preparing for preservation and selecting a platform, we will not be able to archive the data sets in-house.  In the future, it may be possible to collaborate with our information technology department and create an archival system using Apache Hadoop . Some libraries with enough technical resources may already be able to take that step. In the meantime, I think libraries can offer counseling on choosing from the available platforms and perhaps offer data preparation advice based on their experience from archiving smaller sets of research data. In summary, health sciences librarians have relevant expertise and services to offer to big data research and they should consider what combination of services will be the best fit for their institutions.

Categories: Data Science