National Network of Libraries of Medicine
English Arabic Chinese (Simplified) French Hindi Japanese Korean Persian Portuguese Russian Spanish

Data Science

All of Us

SCR Data Science - Thu, 2018-05-17 11:28
Diverse group photo

“All of Us Participants” via, April 2018, Public Domain.

Over the past couple of weeks, you may have hear about a new research program that the NIH has launched.  All of Us endeavors to enroll 1 million or more adults to participate in a research study.  Data from this study will be used to further advance the practice of precision medicine.

Precision medicine allows doctors to determine treatments that are likely more effective for individuals, taking into account their genetics.  Although this approach to medicine is not new, advances in research have significantly fast tracked the progress of it.

A set of core values is guiding the development and implementation of the All of Us Research Program:

  • Participation is open to all.
  • Participants reflect the rich diversity of the U.S.
  • Participants are partners.
  • Participants have access to their information.
  • Data will be accessed broadly for research purposes.
  • Security and privacy will be of highest importance.
  • The program will be a catalyst for positive change in research.

To learn more about All of Us, view this video or visit the website.

Like NNLM SCR on Facebook and follow us on Twitter.

Categories: Data Science

Big Data in Healthcare – Opportunities for Librarians

SEA Data Science - Wed, 2018-05-09 14:12

by Douglas J. Joubert, Informationist, NIH Library, Washington, DC

Over the last seven weeks, in the Big Data in Healthcare – Opportunities for Librarians, we learned about big data and data science within the context of five distinct disciplines. This essay will provide an overview of big data and data science within each of the five disciplines, with a focus on how librarians can support researchers working in these fields.

Although not focused exclusively on Big Data, a recent report has strongly advocated for an increased role for librarians in the field of data science (Burton, Lyon, Erdmann, & Tijerina, 2018). This report outlines a multi-faceted framework for understanding the internal (within the discipline) and external (within the broader science disciplines) drivers that are changing the way in which we think about data.

Data science is one those terms that can take on different meanings, based on a particular practice area. One of the more popular representations of data science is that of Drew Conway. Conway represents data science as the intersection between three primary domains [Figure 1]. It is not vital that librarians be experts in each of the three domains that comprise this Venn diagram, nor is it even possible. What is important, and serves as the primary thesis of this essay is that librarians be grounded in how researchers in each of these areas produce, organize, and analyze data.

The Data Science Venn Diagram











Figure 1: The Data Science Venn diagram[1].

This course introduced us to a number of different perspectives on the topic of big data. The first view was provided by a data informationist (Lisa Federer) who works for a large biomedical research center. She defined big data as having a number of distinct qualities. This first of these qualities is the amount of data being produced, commonly referred to as its volume (Federer, 2017). The second quality is the variety of the data, specifically, pulling data from many different sources, in many different formats (Federer, 2017). The third feature of big data is the rate in which the data is being produced, or its velocity (Federer, 2017). Last, is data veracity. This refers to how much trust we place in the source of the data and the data quality (Federer, 2017). Additional definitions were provided by two social scientists, a practicing clinician, and a nursing researcher.

The nursing perspective provided some additional insights that are worth exploring. First is the unique role that nurses play in the delivery of health care, and how this role influences big data research (Brennan, 2015). Second, Dr. Brennan emphasized that terms like the Big Data to Knowledge (BD2K), big data, and precision medicine mean different things to different people (Brennan, 2015). The role for nursing to play is making these terms meaningful to patients and their families. Last, she emphasized that these tools need to be understood from the nursing experience, which takes a more humanistic approach when compared to the traditional medical model of health care delivery. Nurses are focused on getting the goals of precision medicine into the “hands of the people” (Brennan, 2015). All of these different perspectives are needed to fully understand the role of big data and how big data is changing the way that we conduct research, deliver health care, and make informed decisions.

Using three elements from Martin’s User-Centered Data Management Framework for Librarians, I will advocate for the increased role of librarians in both data science and big data initiatives. These elements are: (1) Service, (2) Best Practices for Data Management, and (3) Literacy (Martin, 2016).

Libraries have a long and rich history of providing services to different user groups. Adding data services as a component to more traditional library services allows libraries to respond to an increased demand for specialized levels of support for data science. Potential roles for librarians could fall into the following categories (1) data extraction, (2) data wrangling, (3) data analysis, or (4) data visualization (Hamalainen, 2016). Some of these skills, like data extraction or data analysis, can be performed without much additional training. Data wrangling and data visualization are not out of reach for most librarians, if they get supplemental training. These four areas also require the least amount of overhead when compared with, for example, hosting a data repository.

Also, many data service questions are very similar to the types of reference questions that librarians have traditionally answered. For example:

  • Knowing where to find authoritative and curated datasets
  • Knowing the best methods for searching datasets
  • Knowing how to choose the best software solutions
  • Knowing about current metadata schemas for data

Each week in this class presented us with a different challenge for managing data, and innovative solutions for dealing with these challenges. We also learned that these challenges are being addressed by local and national initiatives. At the federal level, a 2013 report was released by the Office of Science and Technology that outlined a number of important policy principles (Holdren, 2013). Many of these principles align to the work of libraries, and present us with numerous opportunities. The first is helping researchers comply with changing grant requirements. Second is working with researchers in efforts to maximize transparency and accountability in terms of collecting and storing data. Last is connecting researchers with tools like the Open Science Framework to support data sharing and increasing reproducibility.

As someone who has spent a great deal of his professional life teaching library users, this topic resonates the most with me. Also, I feel that librarians make some of the best teachers. Teaching about data literacy, data analysis, and data management offers incredible potential for librarians. It has been my experience that starting small is the best entry point into teaching these topics, for example, working with a colleague to develop a data literacy class, or volunteering to serve as a teaching assistant or back-up for a more seasoned teacher. Teaching a class in R or Python are admirable goals. However, it might not be the best place to start, nor is it necessarily the right solution for your library. Finally, look for both formal and informal professional development opportunities. This MOOC (Big Data in Healthcare[2]) and Best Practices for Biomedical Research Data Management[3] are just two recent examples of librarian-led data management classes. However, Meet Up groups[4] and connections developed through Social Media are also wonderful way to learn and network.


Brennan, P. (2015). Big Data in Nursing. Bethesda: NINR Big Data Bootcamp.
Burton, M., Lyon, L., Erdmann, C., & Tijerina, B. (2018). Shifting to Data Savvy: The Future of Data Science in Libraries.
Federer, L. (2017). Data Science 101. NNLM Beyond the SEA Webinar Series.
Hamalainen, H. W. (2016). Geoscience Librarianship 101: Making Sense out of “GeoReference.” Baltimore.
Holdren, J. P. (2013). Increasing Access to the Results of Federally Funded Scientific Research. Retrieved from mo_2013.pdf
Martin, E. (2016). The Role of Librarians in Data Science: A Call to Action. Journal of eScience
Librarianship, e1092.


[4]  or

Categories: Data Science

A Unique Data Experience: Reflections on ESFCOM’s Inaugural Hackathon

PNR Data Science - Mon, 2018-05-07 05:00

Today’s blog is by Nancy Shin, Sewell Memorial Fund Librarian Fellow at Washington State University’s Elson S. Floyd College of Medicine. Welcome, Nancy!

The most extraordinary thing happened to Washington State University’s Elson S. Floyd College of Medicine (ESFCOM) the weekend of April 13 -15, 2018.  ESFCOM hosted its inaugural Hackathon, which was organized by the College Technology Incubator Officer, Andrew Richards.  It was well attended by people from all walks of life and subject expertise including students and healthcare providers.  So, the big question is what exactly is a hackathon and why all the hype?

A hackathon is a social event that is focused on building small, innovative, and new technology projects.  It brings together teams of people to work on a common project within an overarching theme; at the end of the event, teams formally present their projects for judging.  The hackathon can last from 4 hours to 1 week (sleep is optional) and can involve large cash purses as prizes.  Typically, projects are technological and can result in the development of a new app or feature on a website in response to a theme; in the case of ESFCOM’s Hackathon, the theme was “challenges in rural healthcare.” The common misconception about a hackathon is that it is an event that is strictly designed for computer programmers, engineers, and software developers – i.e. anyone who codes! However, other skills like research, design, project management, data management, and leadership are also important to the dynamic of an ideal hackathon team.

Arguably the first hackathon was hosted by OpenBSD in 1999, which is an operating system; ten developers came together to work on various software problems over the span of a week (Davis, 2016).  Since then, hackathons have more famously been hosted by various companies like Facebook and Yahoo in 2005 and 2006, respectively, in order to catalyze new innovations in a relatively “risk-free” and “creative” environment (Davis, 2016). In general, hackathons are organized by one of the following communities: open source software companies, tech companies, sponsored competitions, and community institutions (Davis, 2016).

PTme, the winning team

In the health field, a big community hackathon organizer is the National Institutes of Health (NIH) which often hosts hackathons with a bioinformatics theme.  Although the ESFCOM’s Hackathon was heavily inspired by MIT’s “Grand Hack & Hacking Medicine,” what makes the community hackathon at ESFCOM so different and unique from other health hackathons is that it encourages a diverse skillset to tackle healthcare problems.  For example, the winning team PTme was made up of a diverse skillset that included developers, medical students, business leaders, and engineering students while my own hackathon team was made up of a mathematician, bioengineer, computer engineer, designer, and health/data librarian.  Another unique feature of the ESFCOM’s Hackathon was the involvement of health librarians in the Spokane area in creating a “Research Station” that provided active research and data management for the participating teams.  The volunteer librarians were able to provide direct research support to assist with each team’s research and data management needs.  It is those two qualities, skill diversity and library support, which makes ESFCOM’s Hackathon one of a kind and a successful model for other health communities/organizations to follow for their future hackathons!


Davis, R. C. (2016). Hackathons for libraries and librarians. Behavioral & Social Sciences Librarian, 35(2), 87-91. doi:10.1080/01639269.2016.1208561

Categories: Data Science

NNLM Research Data Management Webinar Series: Research Data Management Services: Beyond Analysis and Coding – June 14, 2:00 PM ET

SEA Data Science - Fri, 2018-05-04 10:10

Date/Time: Thursday, June 14, 2018, 2:00 PM ET/11:00 AM PT

Presenter: Margaret Henderson, Health Sciences Librarian, San Diego State University Library, San Diego, CA

Contact: For additional information or questions, please contact Tony Nguyen.

Abstract: There is more to RDM services than the technical skills necessary for data management. Soft skills and non-technical skills are very important when setting up RDM services, and continue to be important to the sustainability of services. Reference skills, relationship building, negotiation, listening, facilitating access to de-centralized resources, policy knowledge and assessment, are all important to the success of a service. Margaret Henderson will discuss these skills and show you how to start RDM services, even if you don’t feel confident about your statistical skills or knowledge of R.

Presenter Bio: Margaret Henderson was recently appointed Health Sciences Librarian at San Diego State University Library. She is liaison to the College of Health and Human Services and is also working with other Librarians at SDSU to set up RDM services. Previously, she spent three and a half years setting up RDM services at Virginia Commonwealth University Libraries. Margaret has been a biomedical librarian for over 30 years and is a Distinguished member of the Academy of Healthcare Information Professionals. She has presented and written on many library topics over the years, and wrote the book, Data Management: A Practical Guide for Librarians (2016, Rowman & Littlefield).

Registration: Please visit our class page to sign up!

Categories: Data Science

Reflections on Big Data in Healthcare: Exploring Emerging Roles

SEA Data Science - Thu, 2018-05-03 08:15

Written by: Paul Levett, Reference and Instructional Librarian, Himmelfarb Health Sciences Library, George Washington University, Washington, DC

Do you think health sciences librarians should get involved with big data in healthcare?

Of the four V’s: velocity, volume, variety, value, described in Cognitive Class (n.d.), it is value where medical librarians come into a discussion about Big Data because we add value to unstructured data, we bring order to chaos! Traditionally librarians have done this by creating metadata about learning objects, e.g. cataloging, finding aids, & infographics. However data mining, cleaning, analysis, and visualization requires computer programming, mathematics, and statistics skills not part of library school MLS programs.  

Burton & Lyon (2017) point to a technical skills gap that prevents librarians from contributing to big data initiatives. They promote the NCSU Data Science and Visualization Institute and Library Carpentry workshops to provide knowledge and opportunity to practice. But the NCSU Data Science and Visualization Institute lasts just one week, nowhere near enough time to develop and practice computer programming language, math, and statistics skills. Library Carpentry workshops typically are one-off instructional sessions that offer even less time, although I appreciate that the course material is available online at  

If we look at the argument should librarians be doing data science, you can argue data science skills do touch on all the domains identified by Drummond et al (2015, Fig. 3 p.15) in the national librarian education needs assessment. Were I invited to suggest a program for developing the necessary skills to work in Big Data in Health Care Information Systems I would suggest a program like the MSc Data Analytics program in the University of Sheffield Department of Computer Science, that provides opportunities to study R and Python programming and statistical analysis and work on a real world project to apply those skills over a one year timeframe. Students on this program apply advanced Mathematics skills which is why the program requires an undergraduate degree in mathematics, economics, accounting, physics, chemistry or engineering.  

This suggests a need for the creation of a data scientist specialty role, but I am not convinced the Library actually is the best home for that role. Recently Simmons College (2017) surveyed 1117 graduates of their MLS program about core librarian professional skills and knowledge, of whom nobody rated data science as a core or a specialized skill, 14 mentioned statistics/working with data, only 6 mentioned data science/curation/management. As recently as last November in the IMLS (2017) meeting on positioning MLS programs for the 21st century there was lots of discussion about increasing the diversity of the profession but only one mention of data curation.

Tsakalos (2017) described Data wrangling as “the process of importing, cleaning, and transforming raw data into actionable information for analysis. It is a time-consuming process that is estimated to take about 60-80% of analysts’ time.” I feel the current push for librarians to develop data wrangling skills is perilously close to an admission from data analysts they want to offload what appears to be an onerous burden. This role would better fit someone working in University Departments of Computer Science, Mathematics, Statistics, or Epidemiology and Biostatistics.  It’s critical for librarians to manage expectations that the library is not a raw data processing warehouse but instead is a knowledge repository.

Where should librarians get involved?

There may be a role for librarians to pass on to Hospital IT departments information about updates and changes to important biomarkers where those need to be manually set as parameters by programmers building clinical decision support on top of EHR systems, however as this enters the realm of medico-legal responsibility the onus should be on EHR software developers to perform this necessary ongoing maintenance role.

Krumholz (2014) described how observational non-experimental studies generate data to support causal inferences and he points to comparative effectiveness studies as a potentially useful application of cluster analysis on large clinical data sets. A systematic review should be a pre-requisite for any health policy comparative effectiveness study, and this is where I as a librarian could best employ my literature search skills.

Librarians could be trained and certified to deliver RedCAP training, the data capture form design issues are similar to Microsoft Access, librarians would benefit by developing a deeper understanding of study design issues such as timing follow-up, patient data protection principles, and setting automated reminder parameters, while the enterprise would benefit from additional trainers to further spread the use of the RedCAP clinical trial data collection tool.


Burton, M., and Lyon, L. (2017). Data Science in Libraries. Research Data and Preservation (RDAP) Review. Bulletin of the Association for Information Science and Technology, 43(4) 33-35.

Cognitive Class/Fireside Analytics (n.d.). Big Data 101. Retrieved from

Drummond, C., Clareson, T., Gemmill Arp, L., and Skinner, K. (2015). Libraries, Archives, and Museums (LAM) Education Needs Assessments: Bridging the Gaps. Retrieved from

U.S. Institute of Museum and Library Services (IMLS) (2017). Positioning MLS programs for the 21st century. Retrieved from

Krumholz, H. M. (2014). Big data and new knowledge in Medicine. Health Affairs, 33(7): 1163-1170

Simmons College (2017). Librarian professional skills and knowledge survey April 2017. Retrieved from

Tsakalos, V. (2017). Data wrangling. Retrieved from

Categories: Data Science

Learning, Networking, and Sharing: Report on the April 10-11 NNLM Research Data Management Course Capstone Summit

PSR Data Science - Wed, 2018-05-02 19:43

by Andrea Lynch, MLIS
Scholarly Communications Librarian
Lee Graff Medical & Scientific Library
City of Hope
Duarte, CA

As part of the culmination of the NNLM Biomedical and Health Research Data Management for Librarians spring 2018 course (NNLM RDM course), a two-day Capstone Summit was held April 10-11, 2018, at the NIH campus in Bethesda, Md. Over 40 medical and health sciences librarians attended the impactful event, along with representatives from the National Library of Medicine (NLM) and various team members from the National Network of Libraries of Medicine (NNLM) regional network offices. It was a great opportunity to meet (in-person) fellow cohort participants as well as to get to know our NLM and NNLM colleagues while getting feedback on our Capstone Project plans.

group picture of attendees at the research data management capstone summit in bethesdaResearch Data Management Capstone Summit Attendees

The first day began with a meet & greet and a welcome from the NLM and NNLM representatives. We then had an opportunity to meet our mentors as well as fellow mentees supported by our assigned mentor. Then came the part of the event I was most anticipating, a presentation by NLM Director, Dr. Patricia Flatley Brennan. She highlighted the NLM Strategic Plan and addressed a myriad of questions. We then presented our Capstone Projects in small groups and received feedback from our peers and other course mentors. We enjoyed a delicious lunch, then went back to work participating in roundtable discussions on topics such as scalability and tools & technology supporting research data management programs and services. We were then fortunate enough to hear a presentation by a panel of experts at NLM and NIH, including Dr. Dina Demner-Fushman from NLM; Dr. Ben Busby of NCBI; and Lisa Federer of the NIH Library. We ended the day with an activity where we each wrote our best idea pertaining to research data management program success, and then collectively and anonymously rated each idea to come up with the handful of best ideas amongst the group.

The second day began with a group activity, with a goal of sharing our Capstone Project plans and getting high-level feedback. We then performed a group activity collecting aggregated feedback about the RDM course within small groups. Next up, Regina Raboin, Associate Director of the Lamar Soutter Library and Editor-in-Chief of the Journal of eScience Librarianship (JeSLIB), presented an overview and recent changes pertaining to the journal. She encouraged the course participants to submit manuscripts detailing their Capstone Projects once completed. The final presentation was by Kevin Read and Alisa Surkis of NYU with case study highlights from the academic medical libraries who participated in a NNLM Middle Atlantic Region Pilot Project on research data management. The concluding remarks from Amanda Wilson from NLM’s National Network Coordinating Office, as well as Ann Glusker & Ann Madhavan from NNLM Pacific Northwest Region did a great job of synthesizing the event’s outcomes and inspiring us to forge ahead on our Capstone Projects!

The Capstone Projects are due at the end of August. So, be on the lookout for those updates from NNLM and/or the respective course cohort participants. If you are going to the Medical Library Association annual meeting this month, please attend Sheila Green’s Lighting Talk detailing her experience participating in the NNLM RDM course, which is scheduled on the afternoon of May 22, 2018 (Sheila is a speaker during the Lighting Talk #5 session from 3:00 to 4:25 p.m.). Also, visit NNLM’s RD3 website for interesting research data management developments and RDM-related news, updates, and initiatives. The NNLM Research Data Management Working Group is very active and will update the site regularly. Lastly, keep your eyes peeled for the JeSLIB special issue on research data management and for a database of Capstone project reports on the NNLM RD3 site.

Categories: Data Science

Save the date! Mountain West Data Librarian Symposium August 13-14, 2018

MCR Data Science - Wed, 2018-05-02 13:23

The Mountain West Data Librarian Symposium is a low-cost professional development opportunity for librarians and research data specialists. August 13-14 at the University of Colorado Boulder for the inaugural Mountain West Data Librarian Symposium!

Registration will be under $30 and open in June. The symposium will consist of morning workshops and afternoon unconference sessions. They are currently seeking workshop proposals for the symposium. Proposals are due Tuesday, May 22.

Workshops may run 55, 90 or 120 minutes. They are seeking hands-on training sessions that focus on active attendee participation. Proposals should focus on how to fulfill the job duties of research data specialists. Potential topics include, but are not limited to:

  • teaching research data management
  • data sharing and re-use
  • data curation and preservation
  • data literacy
  • research reproducibility
  • communication and outreach

Proposals of up to 250 words should be submitted to: Please view the full call for proposals prior to submitting.

Stay tuned for more announcements, follow @MountainWestDLS on Twitter, or check out the Mountain West Data Librarian Symposium Website. Please send any questions to /da

Categories: Data Science

Big Data for Hospital Librarians – Are We There, Yet?

MAR Data Science - Tue, 2018-05-01 08:00

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by Elanor Pickens, Medical Librarian, Portsmouth Regional Hospital, Portsmouth, NH

“Big data” is a term that has been recently appearing in the literature of academic librarianship. However, as a hospital librarian, may I consider this justification enough to explore its applicability to my current position? The definition of big data is multi-faceted and covered elsewhere, and I will refer readers on (Gandomi & Haider, 2015). Here I simply seek to understand my potential role in this field.

Proposed by Martin (2016) is a framework of five basic categories, within which librarians may find at least one opportunity in supporting data science activities. One such role that the hospital librarian may engage in might be found within the “Literacy” domain, through teaching. This may include remaining informed about the various programming languages and software (R, Python, SAS, etc.), and their strengths and weaknesses, so that we may assist in the research process by educating about data management options. Additionally, we might lead researchers to resources that can help them visualize their data, to enhance their own comprehension of relationships and to enable them to better present their findings to others. If we also understand the form of data that they are collecting (structured, semi-structured, or unstructured), we may be able to help them discover studies that have utilized similar data so that they may anticipate any barriers of organization and analysis that they might encounter.

An example of a concern that researchers may have with respect to big data analysis is that a question needs to be relatively clear before the examination of any data (Brennan). Although slight revisions may be necessary, or new questions might arise, which could be further examined, there should not really be a significant change in direction of the original question. There are too many data and methods of analysis to proceed in big data research without a clear understanding of where it leads. Librarians are proficient at refining questions in order to get at the core of a research query as part of their reference interview skillset. We also have at least some basic knowledge of the different types of data analysis methods that are described in the literature, although much of this exposure does not include research conducted with big data. Iwashyna and Liu (2014) state that, in contrast to the formulation of one hypothesis and advance selection of the data that will need to be carefully collected to support or refute this hypothesis (traditional epidemiology), the multitude and variety of big data are able to be custom fit to epidemiological studies to identify patterns. Similarly, the authors discuss that a number of analytic methods can be adapted and combined when using big data, whereas traditional epidemiology is usually restricted to one analytic method per study. For hospital librarians, an awareness of research question modifications and understanding of methods of data analysis may not necessarily yield additional support to experienced researchers, but it may still help guide those individuals who are subject specialists in their field but new to the research process.

Groeneveld and Rumsfeld (2015) note that big data certainly has predictive power in clinical decision-making; however, big data cannot determine which associations are due to random events, nor can they identify causal associations. In addition, the authors point to a lack of large-scale analytical methodology for scientific comparison studies comparable to the level currently available in big data analysis. It is necessary for researchers, especially those seeking publication, to consider the reproducibility of their studies when they are using such highly adaptive and dynamic models of analysis. In this case, there may be little more for hospital librarians to do than to continue to assist researchers in discovering the current best practices for big data publication with respect to transparency.

For the solo hospital librarian, who is often juggling multiple tasks, big data may simply not be a sphere in which to operate. I feel that despite having gained a useful surface understanding of big data concepts, I am still unable to determine how I may be able to apply this knowledge in my current position, or whether I would even have the time to do so. And without a clearly-defined path, management support for professional development opportunities is essentially non-existent (Burton, 2017). One area of interest that has piqued my curiosity is how big data may inform my organization’s operations through the revision of protocols, thereby improving clinical practice. I have never really had much opportunity before to consider how data collected by our EHR truly inform patient care, and especially how they might impact revenue. For example, patients who come in for vaccinations may also receive additional preventive care (Kaelber, 2016). But in order to actually delve into the big data arena, it may be up to each individual librarian to either maintain a basic awareness, or seek out opportunities that may or may not be supported at the organizational level.


Burton, M., & Lyon, L. (2017). Data science in libraries. Bulletin of the Association for Information Science and Technology, 43(4), 33-35.

Brennan, P. (2015). Big Data in Nursing Research. NINR Big Data Boot Camp Part 4: Big Data in Nursing Research.

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137-144.

Groeneveld, P. W., & Rumsfeld, J. S. (2016). Can big data fulfill its promise? Circulation: Cardiovascular Quality and Outcomes, 9(6), 679-682. PMCID: PMC5396388.

Iwashyna, T. J., & Liu, V. (2014). What’s so different about big data? A primer for clinicians trained to think epidemiologically. Annals of the American Thoracic Society, 11(7), 1130-1135. PMCID: PMC4214055.

Kaelber, D. (2016). Using Clinical Data to Improve Clinical Patient Outcomes. NNLM Forum (online).

Martin, E. R. (2016). The Role of Librarians in Data Science: A Call to Action. Journal of eScience Librarianship, 4(2), 7.

Categories: Data Science

Reflections on Big Data in Healthcare: Exploring Emerging Roles

GMR Data Science - Mon, 2018-04-30 09:22

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Patricia L. Smith, Impact and Dissemination Librarian at Galter Health Sciences Library, Northwestern University, Chicago, IL

Big data in healthcare is a booming area with many facets and ample opportunities for library involvement. The question is not should librarians get involved, but how can librarians get involved? Librarians are natural stewards for big data—we have unique skills that we can leverage to assist researchers, particularly in citing data, data management, information ethics, and data visualization.

The most natural, and perhaps easiest, segue into big data for librarians is in the area of data citation. Researchers are expected to cite their sources—but what about data sets? Data sets are informing practice and are integral parts of the research process, but it is not yet standard practice to cite data. Due to this gap, it is very difficult to trace the use of this data, which hinders the overall research process. Librarians are already embedded in citation support. We teach classes on EndNote, RefWorks, and other bibliographic management software, and answer questions about citation styles and bibliographies. We are already poised to start conversations about the importance of citing data. Librarians can take the initiative create guides, classes, and other promotional material about how to cite data and why it is important. Furthermore, promoting the citation of data would help us track metrics and provide invaluable information about the impact, resonance, and reach of our researchers’ work. This is also an opportunity to promote depositing data sets in institutional repositories when appropriate. Finally, we also have relationships with vendors/publishers—this could open up additional conversations about indexing data sets in various databases.

Another area in which librarians are increasingly getting involved is in the area of research data management. Metadata librarians, electronic resources librarians, and data librarians are uniquely positioned to collect and appraise data, manage data collections and add appropriate metadata, and preserve data. We can help researchers with best practices for data structure, vocabularies, formats, and more.

Big data is not without controversy when it comes to privacy and ethics. Librarians have a history of exhibiting passion in the area of information ethics, so this seems like a natural partnership! Librarians can take the initiative to start conversations with the public about big data—what it is, what it is not, and why it could raise the proverbial ethical eyebrows. On the flip side, librarians can also have conversations with researchers about the public’s concerns surrounding big data. Researchers probably have the best intentions when it comes to using big data, but they need to be aware of why people might have concerns with privacy. Some hold the belief that “patients have a moral obligation to contribute to the common purpose of improving the quality and value of clinical care in the system.”[1] While I concur that participation in healthcare is crucial to moving the science forward, the phrase “moral obligation” might not be the best choice of words, especially from the perspective of skeptical patients, patients concerned with privacy, or patients from racial or ethnic groups that have historically been mistreated by the medical community. Librarians might be able to liaise between the public and researchers to help strengthen these partnerships, and help researchers communicate in the most effective ways.

Another way librarians can get involved in big data is by learning more about data visualization. Not all librarians have to learn R, or Python, or JavaScript, but having a basic knowledge of programming and speaking the language of data scientists will only help our position. There are many free resources to learn about data visualization, e.g. Sci2, Tableau Public, VOSviewer, and more. Presenting data in a visual format is a valued skill, and librarians can learn some basic skills to get a seat at the table.

Overall, there are many ways librarians can and should get involved in big data in healthcare. We must be confident about the skills we already possess and how they can translate to big data, and we must be proactive in marketing our knowledge.


  1. Longhurst CA, Harrington RA, Shah NH. A ‘green button’ for using aggregate patient data at the point of care. Health Aff [Internet]. 2014;33(7):1229-35.
Categories: Data Science

Do You Think Health Sciences Librarians Should Get Involved with Big Data in Healthcare?

PNR Data Science - Fri, 2018-04-27 05:00

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by Sara Pimental, Senior Consultant, Kaiser Permanente, San Francisco, CA

My answer to this question is a qualified yes. However, librarians don’t HAVE to get involved to be successful. I think people panic when they think if they don’t get involved in every aspect of new trends in librarianship they will become obsolete. There are many ways to evolve; big data is just one of them.

Since I am involved in one aspect of utilizing Big Data, I would have to say yes, librarians who have the interest, should get their hands dirty. I can see skills that all librarians possess being useful in all aspects of BIG Data. For those more technically inclined, they should go all the way and become data scientists. Many us use have learned programing languages and other similar tasks and could do very well in this area.

For those of us who have no desire to become so technical but have a curious fondness for metadata there are many niches for that type of person. This is where I have landed. I assist not just with taxonomy and metadata for my website but also with linking structured data from the EHR with clinical information available on the website and soon with subscribed third party. I could envision a librarian’s talents also being useful with unstructured data such as the notes in the EHR.

In conclusion, there are a myriad of ways a librarian can get involved with Big Data. In this class we have learned about quite a few of them. I remember when I attended the opening reception at NLM’s Biomedical Informatics Course at Woods Hole, Dr. Lindberg told us we were change agents. I hope some of the participants of this class become just as inspired.

Categories: Data Science

Funding Awarded to UMN for Research Data Management Education

GMR Data Science - Thu, 2018-04-26 09:33

The GMR is excited to announce that the Health Sciences Libraries at the University of Minnesota have been awarded a Research Data Management (RDM) Award to support research data management services! The project will expand RDM education not only within their institution but across the GMR as well!

Project Description

This project has two goals:

  1. Enable health science librarians at institutions throughout the GMR to build research data management knowledge and skills and develop actionable next steps to provide data services at their libraries
  2. Enable health science faculty and graduate/professional students at the University of Minnesota Duluth (UMD) to better understand data management best practices, be better positioned to prepare more competitive grant proposals, and learn how to prepare datasets for preservation, sharing, and re-use

To address Goal 1, the University of Minnesota will fund up to twelve travel stipends for librarians across the GMR to travel to Minneapolis and attend a special MLA CE approved Data Management Course. Librarians will be selected through a competitive application process.

To accomplish Goal 2, a data management workshop will be hosted on the University of Minnesota Duluth campus for up to 40 faculty and students. In person consultations will also be offered following the workshop to offer more personalized training.

Congrats to UMN and be on the lookout in the coming months for information about applying to attend the Data Management Course in Minnesota!

Categories: Data Science

Reflections on Big Data in Healthcare: Exploring Emerging Roles

MAR Data Science - Tue, 2018-04-24 08:00

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written By Darlene Kelly, Library Director, Charles R. Drew University of Medicine and Science, Los Angeles, CA

Over the course of the past few years, health sciences librarians (HSL) have become engaged in the discovery of Big Data in Healthcare. HSL have a history of being early adopters in technology and we continue to demonstrate our ability to become transformative in the field of information science. I believe Dr. Patricia Brennan’s, National Library of Medicine (NLM) Director, focus on data science initiatives has further propelled librarians to contribute to National Library of Medicine’s Strategic Initiatives on Big Data, and we are making strides. I strongly believe HSL are taking the necessary steps to become expert partners in the field of data sciences. There are 4 components that I assert are essential in this process, including: NLM training and support; the Medical Library Association (MLA) education; the creation of collaborations; and self-discovery through life-long learning.

The National Library of Medicine continues to be a catalyst in the discovery of Big Data initiatives. Epstein (2017) reported on NLM’s “increased focus on data science” (p. 308). Hence NLM has created funding opportunities such as: the NLM Administrative Supplements for Informationist Services in NIH-funded Research Projects (Admin Supp) to assist in identifying collaborations with researchers who have NIH funding, and encouraging librarians to participate in data science. In addition, the National Network of Libraries of Medicine (NNLM) has implemented training courses on data sciences, including this course on Big Data in Healthcare.

The Medical Library Association (MLA) is addressing the data science initiatives by modifying its competencies to include skills related to “information management and the curation of clinical and health information data” (Epstein, 2017, p. 309). MLA offers continuing education opportunities for librarians in the forms of classes, webinars, the annual meeting presentations, and most recently the implementation of the MLA Research Training Institute. Thus, librarians have multiple opportunities to learn more about the field of data science.

Additionally, collaboration is key to becoming involved in data science. Through this class, I have learned about many of the opportunities that are available in the field of data science. Since this course, I have become more confident in how to identify possible areas of collaborations. I was especially interested in the work by Read et al, (2015) that has been implemented at the New York University Health Sciences Library.

Librarians are curious informationists and I think our involvement in data science is a natural progression. Often it is through self-discovery that we become involved in or are able to identify where we can contribute. An example, of self-discovery is to continue to learn more about data science through other courses such as through Coursera. In addition, it starts with having a conversation with select stakeholders to identify potential interest in data science.

In conclusion, librarians can become experts in the field of data science. There are several mechanisms that are already in place to assist us with learning these skill sets. Both NLM and MLA have a vast number of opportunities to learn more about data science and to participate in data science initiatives. As we create partnerships and collaborations, we will become more knowledgeable about the needs of our researchers. In addition, librarians must remember that we are life-long learning and we welcome a challenge.


Epstein, B. A. (2017). Health sciences libraries in the United States: new directions. Health Information & Libraries Journal34(4), 307-311.

Read, K. B., Surkis, A., Larson, C., McCrillis, A., Graff, A., Nicholson, J., & Xu, J. (2015). Starting the data conversation: informing data services at an academic health sciences library. Journal of the Medical Library Association: JMLA103(3), 131.


Categories: Data Science

Should Health Sciences Librarians be Involved with Big Data in Healthcare?

GMR Data Science - Mon, 2018-04-23 09:03

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Mary Wittenbreer, Head Medical Librarian, Regions Hospital, St. Paul, MN

I would like to give an enthusiastic YES to the question “Should health sciences librarians get involved with big data in healthcare?” I believe that librarians have the skill sets to provide assistance and collaborate with most professions. However the size of my yes gets bigger or smaller when I step back and look at my current situation.

I am a hospital librarian in a regional integrated healthcare system with a large research and education institution. As hospital librarians, our first priority is assisting and providing clinicians with knowledge-based resources for patient care. I don’t want to make this into an issue about not having adequate staff and time but it does come into play. The librarians in the Read article spent a substantial amount of time reading the literature, choosing and creating questions, selecting the study participants and conducting the interviews and then analyzing the results to determine how and what the librarians could assist the researchers. In my institution, I would need a champion who had already half-way convinced those doing research that it would be worth their time to speak with a librarian. I am not saying that this is impossible, but the challenge is there. Or do I need to get over this and accept that not being adequately staffed is the new norm.

Hospital librarians are very capable in training researchers in how to best store and archive data and how to make it findable for future users. We are also capable of writing instructions for standardizing these processes. Our skill set allows us to step-in at the beginning of a project to help organize and identify any special services that might be needed. I particularly liked Martin’s view that at the center of all this big data collection is the user, not the data. Her division of user’s needs into different buckets was helpful in that it put into perspective, one piece at a time, what a librarian’s role would be in each category. Thus breaking big data into smaller pieces.

But I have to admit my eyes glaze over at the mention of R, Python, Tableau,  LOCKSS, and CLOCKSS. This class, I feel, did an excellent job of introducing me to Data Science and its language. I felt that I could read the articles without having to look up too many definitions. Looking back to 9 weeks ago, I realize how little I really understood about Big Data. Now I realize that I know probably just enough to confuse myself and others. I am definitely caught in a training gap and it is decision time. Do I continue to educate myself and suggest to my co-workers to do as well, or do I stop because nothing will ever come of any additional training.

Then my inner librarian voice speaks up and says, “Keep Going!” There are many opportunities for librarian involvement in Data Management within my organization.  Researchers have been extracting patient population data from the EMR for a number of years. They may have systems in place for storing, archiving and sharing but I won’t know until I ask. Holding information interviews might very well be possible for me and my co-workers to handle. Find that champion. Take more courses.

I realize that my situation may be unique in the hospital library world. Not all hospitals have an established research arm. If a librarian’s job is to organize information, data is information. Librarians will need to know how to search the data sets and interpret the meaning just as we do different databases and journal article types. To not be involved in Big Data or to not train future librarians in Data Science is not forward thinking. In the 2017-2027 NLM Strategic Plan Dr. Brennan states in From the Director section, “The very nature of libraries is changing.” I say a big YES.


Categories: Data Science

Big Data, Healthcare, and the Evolution of the Health Science Librarian

PNR Data Science - Mon, 2018-04-23 05:00

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Lisa Mastin, Medical Librarian, WellStar Atlanta Medical Center, Atlanta, GA

Data is part of life and the amount of data being created, captured, stored, and analyzed is expanding exponentially. In the healthcare sector, Big Data is rapidly changing the landscape. Health Science librarians should get involved with big data in healthcare, at least at a basic level, because if they do not, they risk losing the ability to engage with the user (i.e. researcher, clinician, patron), in a user-centered environment. I see health science librarians working in several areas of data science. At the very least, and possibly the most essential element, would be to acquire an understanding of the language used in data science.

Although I do not believe all librarians should become data scientists or even work with big data (several postings in this course expressed a similar opinion), I do believe that all health science librarians need to know the terminology. In an online discussion based on the National Institute of Nursing Research (NINR) module of Big Data, one reflection on Dr. Brennan’s video mentioned that she liked Dr. Brennan’s comment that “data science is a team sport,” and agreed that as librarians, we should be able to speak the language and “at least know who to turn to or ask.” This relates to the second area I feel that health science librarians should get involved with big data in healthcare – knowing who to go to with questions. In a reply to her reflection, another remarked, “librarians connect our users to articles, books, databases, and web resources;” so “what’s to stop us from connecting our users to experts on campus?” I agree that librarians can learn who the data science experts are at their institution and then pass that information along to their users. In doing this, the health science librarians are establishing contacts and forming relationships across their campus or institution, and creating connections is something else librarians are skilled at doing.

Training is also a skill that librarians excel at and is the next area where I see health science librarians becoming involved with big data. As Jeff Durham noted in a reflection on medical research, librarians, “have advanced skills in information and pedagogy,” so are well suited to train researchers. Other class members shared this idea, and I believe that most librarians feel confident when it comes to training/teaching. Health science librarians could, for example, train researchers on how to use data science-related technology tools or on how to find specific information in their electronic health records (EHR). If health science librarians gain access to the EHR at their institutions, this opens the door to other areas in which they could assist with big data. I see librarians creating metadata and/or controlled vocabularies for the natural language portion of patient notes entered into the EHR by clinicians. We discussed this in the module five online discussion session and several participants expressed interest in assisting in these areas, as well as working with an EHR in other capacities (i.e. adding links to the library website or related databases, adding information for physicians, etc.).

In addition to the areas I have mentioned, I feel that data visualization, population health, and data management would also be areas in which health science librarians could work with big data. Traditional librarian skills, such as information searching, research methods, database management, archival work, and digital preservation combined with some newer skill sets (data literacy, informatics, visual analytics), will allow health science librarians to compete for these roles. Where and how health science librarians decide to get involved with big data in healthcare will certainly vary by individual librarian, by what is most important is that they do become involved with it. I reviewed an article about an ongoing big data research project on cardiovascular care in China, and in this article, there was no mention of librarians assisting with the project. One of the course instructors made the wise comment that she wondered if there are people working on the project performing research data management functions. if there was someone performing these roles, they weren’t trained as librarians. I now think that there are probably many research projects where people are doing the data science work that we have discussed in this course, but librarians are not doing it. Health Science librarians can bring their unique skills to big data research projects if they possess the skills and researchers know librarians are capable and can provide big data support.

Categories: Data Science

Data, Data Everywhere and Not a Drop to Drink

PNR Data Science - Fri, 2018-04-20 05:00

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Jeff Durham, Medical Librarian, Desert Regional Medical Center, Palm Springs, CA

We swim in a sea of information; more often than not we are drowning in it. When a person is presented with a smorgasbord of data, how do we determine what we should eat? This is the current situation with regards to big data and healthcare. What data should be utilized and how. It is in this data-centric meal that the data-savvy health science librarian should be most at home: as critic, guide, and chef.

As health science librarians, we have a responsibility to not only provide the communities that we serve with access to up-to-date and accurate information, but also must be available to enable and facilitate the informational needs of researchers in our communities. With the tremendous amount of big data that is generated on a daily basis, health science librarians have a duty to become involved and assist all of their patrons, both lay and professional, to access, extract, and manage the data (both big and small) that they need.

There are barriers to making a librarian into a data-savvy librarian who can tackle big data problems with ease. One barrier is that many graduate schools in library and information science have not been as keen to teach data science in a general education format, preferring to see it more as a sub-specialty. This occurs ironically enough in iSchools as well. While there is a growing trend to change this educational oversight, it is not the dominant paradigm yet. Another barrier is that of opportunity. All too often, the librarian simply does not have the time or their employer does not provide the means (e.g. time off, reimbursement) for the librarian to refresh their skill set. Until library managers and directors see the value of continuing education of the librarians on their staff on how to use data science and work with big data, the health sciences librarian will continue to fall behind.

There are also opportunities to be found. In hospitals and health science libraries, with residents and medical students, there are lots of in-roads for librarians to make. Given the exponential growth in big data that biomedical devices and the prevalence of smart devices which are constantly generating both passive and active data there is a lot of big data to utilize. The data that is being produced has the potential to be used in research projects for students, residents, nurses, and doctors on staff. There is a significant gap between the abilities of these medical professionals and that of data science. The role of the data-savvy librarian is to be a bridge between these gaps. The data-savvy librarian is able to assist their patrons in identify the datasets that they need as well as demonstrating how to wrangle, clean and visualize their data. By doing this, the librarian provides an essential role in the medical field. It is through the management of big data and assisting the researcher with working with the data and discerning patterns and trends that the librarian enables the student, nurse, or clinician to make evidence-based decisions on the data. By doing so, the librarian assists not only the informational needs of the researchers, but also has a very real impact on patient care.

Categories: Data Science

NNLM “All of Us” National Program Launches May 6: You Can Get Involved!

PSR Data Science - Wed, 2018-04-18 19:41
All of Us Research Program

The National Network of Libraries of Medicine (NNLM) is excited to announce the official launch of the NIH All of Us Research Program on Sunday, May 6, 2018! This national event will be held in seven local communities throughout the United States and will be broadcast via this website and on Facebook Live.

The All of Us Research Program is a historic effort to gather data from one million or more people living in the United States to accelerate research and improve health. Program goals are to develop a more effective way to treat disease by considering individual differences in lifestyle, environment and biology. This initiative comes from the key element from the Precision Medicine Initiative.

Additional information about this Program is available through the Precision Medicine – All of Us Research Program website. Program information is available for download in English and Spanish. NNLM Network Members can learn about involvement opportunities at a one-hour webinar on April 30th at 11:00am PDT.

Categories: Data Science

Reflections on Big Data in Healthcare: Exploring Emerging Roles

PNR Data Science - Mon, 2018-04-16 17:55

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Kathleen Carlson, Education Librarian, College of Medicine Phoenix, University of Arizona, Phoenix, AZ

It is essential for the future of medical librarians to get involved in Big Data. Much of our future work will be coming from big data research projects, especially librarians that work in hospitals and health care systems. Since librarians were early adopters of technology, we were able to move from print indexes to searching indexes on CD-ROMs that were eventually moved to the Web. Moving from the card catalogue to integrated automated library systems, librarians understand how important it is to move forward with Big Data. Many of the older, experienced librarians may not have the expertise or training in the fields of math, computational skills, statistics and domain expertise but we know that our profession should be part of our institutions Big Data team and at least have a seat at the table.

I know that being an Assistant Professor of Practice in the Department of Biomedical Informatics (BMI) at my academic institution, has allowed me to understand and speak the language of Big Data. Clinicians will come to me for resources and journal articles and I have learned a lot by attending monthly journal club meetings on different subjects of Biomedical Informatics and Big Data. BMI fellows, Chief Medical Information Officers (CMIO,) Chief Nursing Officers (CNO) of area hospitals, and BMI faculty attend the sessions. Here I have an opportunity to be seen and be heard and ask questions when they arise as a non-clinician. We have covered the following topics of Big Data and Informatics in the past three years:

  • Cybersecurity
  • Data Standards
  • Health Literacy
  • Electronic Health Record/Electronic Patient Record
  • Process Oriented Health Information Systems
  • Clinical Decision Support Systems
  • Graphic Display and Visualization
  • Health information Exchange
  • Cloud Computing Services
  • Substitutable Medical Applications and Reusable Technologies (SMART)
  • Fast Health Interoperability Resources (FHIR)

I also attend monthly Clinical Informatics Grand Rounds. The speakers vary from clinicians to researchers, MBA, Pharmacy and Public Health faculty.

So, for the past three years I have had a seat at the table and have given our library visibility within Biomedical Informatics and Big Data. I also believe that a medical librarian at any institution should find a champion or champions that will assist him/her in getting a seat at the table. And when that is accomplished, a hospital librarian should get permission to embed at least one vetted  link that is appropriate to a patient’s electronic record with,  National Institute on Aging, or another consumer health oriented resource. This would relieve the burden on clinicians in finding the best resource for patient care.

Big Data can be organized, appraised, secured, preserved with a librarian’s help and can assist researchers and clinicians in patient care and help find areas that may need improvement. Creating an online resource guide with Big Data tools and resources can be a first step into marketing the librarian and library. The NNLM PSR had recently recruited a data and technology services coordinator. She asked librarians if they collected any data for their institution. Unfortunately, we are considered a satellite campus of a large Research One University. I think there are areas at my institution where data is collected but could be used more effectively. I know within the Scholarly Project, a four-year mandatory thesis and poster at our institution, many of our students use Big Data from area hospitals or the state’s data archives to have foundational information in their presentations and theses. They are assisted by their clinical mentors.

I also like one of my fellow course student’s discussion post about teaching himself ‘R’ so he is able to teach classes to the data scientists on his campus. Finding resources for Big Data programming language and free software for statistical computing and graphics software like ‘R’ and can help the librarian be an informational resource for Big Data collection. This instruction example is one-way librarians will have to get out of your comfort zone and put themselves out there for Big Data. We have access to SPSS and STATA in our library commons. I took three classes on RedCap to help me understand Big Data and how to collect it safely and securely. REDCap is a secure web application for building and managing online surveys and databases and collecting data.

The librarian can be the go-to resource for students and researchers and help them search the archives of stored Big Data sets. I do not believe that our small campus has the capacity to store Big Data and it is not something that the larger academic institution is willing to duplicate. I do believe that as a librarian being visible and attending committee meetings, journal clubs, clinical informatic rounds and actually showing an interest in learning about Big Data gives a librarian the knowledge and vocabulary to understand and share with her constituents. The librarian can also familiarize himself/herself with websites that assist in Big Data knowledge similar to the Institute for Health Metrics and Evaluation which I learned about in the course discussions.

Categories: Data Science

Reflections on Big Data in Healthcare: Exploring Emerging Roles

GMR Data Science - Mon, 2018-04-16 09:35

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Nicole Montgomery, MISLT, AHIP, Librarian, Assistant Professor, CoxHealth Systems and Cox College, Springfield, MO

I am certain that Health Sciences Librarians should be involved with anything healthcare. This is our job.

I have often teased that we are the bartenders of our institutions. We have a seat in the organization that is unique to any other in that it allows us to interact with everybody. Literally, everybody! From the person who cleans the library, to the CEO of the hospital, or the people who work in financial services, the nurse on the floor, an occupational therapy student, a patient who just learned her baby will be staying in the NICU, or a physician trying to determine the best treatment for a difficult case. We hear people’s stories; we hear their frustrations and sometimes lend an ear when they need one. Librarians are intrinsically user-focused.

We typically get to know our users, and we are able to see the overall picture of the information they are seeking. Because of our familiarity with our users, if a physician needs insight into a nutrition-related topic, I am in a position to know which dietician on staff will likely be able and willing to answer his questions. Or, when the college I work with decides to investigate some cool 3-D equipment, I am able to suggest collaborating with the hospital’s residency program to share the cost and make the most of using the equipment. The real-life examples are endless, but ultimately, we desire to bridge the gap between departments, disciplines and people with like-interests; because we know that working together is usually better than staying in our silos.

What I am not certain of, is to what level we should be involved with big data initiatives. In the light of Big Data, I believe most librarians still have a lot to learn about our organizations before we may answer the question about our level of involvement. I imagine we will all find different answers.

In conjunction to exploring our institutions, I think librarians need to begin discussions in an attempt to answer how Big Data may impact libraries. We need to ask ourselves questions about the future such as: will we still have print books, current journals and stacks of bound serials? Will libraries still exist as brick and mortar buildings? Will all of our materials be delivered electronically? Will the librarian simply become a person behind a computer screen? Will our profession become a fond memory of the past, just like the card catalog? What will the entire publishing industry look like? Krumholz briefly addresses the question about the publishing industry on p. 1169 of his article by saying, “In the future, the products of scientific inquiry may evolve from a static journal publication to a more dynamic platform for presenting and updating results.” Brennan predicts the same at 1:10:21 of her presentation. She says (with an apology to any journal editors), “We’re moving pretty quickly away from journal articles and pretty fast into blogs…and shared knowledge building. In health sciences, the “bread and butter” of our world is journal articles. While we, as librarians, typically pride ourselves on being willing to embrace technology, I think the inception of Big Data into our world may challenge us and may change our profession in a way we cannot yet imagine.

In an effort to give us a place to begin, librarian Elaine R. Martin provides a proposed “Data Management Framework for Librarians.” She says her proposed framework is user-centered and includes five “buckets”: Data Services, Data Management Practices, Data Literacy, Archives/Preservation, and Data Policy. Without delving into explaining each “bucket” within this essay, it is easy to say that each proposed bucket provides familiar concepts to librarians. For instance, the Data Services bucket, “…may include the following activities: assessing researcher needs, performing an institutional data environmental scan, conducting the research interview, designing a suite of services such as assistance with DMPs [Data Management Practices] based on user needs, etc.” These concepts are digestible for librarians and definitely provide us with a place to start.

While my parallel of being the bartenders of our institutions is intended to be humorous, there is quite a bit of truth to this. No matter what changes the future holds, as librarians, we will instinctively do our part.


  1. Krumholz, HM. Big Data And New Knowledge In Medicine: The Thinking, Training, And Tools Needed For A Learning Health System
  2. Brennan, Patti. NINR Big Data Boot Camp Part 4: Big Data in Nursing Research
  3. Martin, Elaine R. The Role of Librarians in Data Science: A Call to Action
Categories: Data Science

Eric Dishman to Deliver 2018 NLM/MLA Leiter Lecture Videocast on May 9

PSR Data Science - Thu, 2018-04-12 15:57
Eric DishmanEric Dishman

Eric Dishman, director of the All of Us Research Program at the National Institutes of Health, will deliver the 2018 Joseph Leiter National Library of Medicine/Medical Library Association Lecture on Wednesday, May 9, at 10:30 AM PDT, in the Lister Hill Auditorium on the NIH Campus. The lecture is open to the public. It will be broadcast live on the Web (and later archived) at: The featured presentation will be Precision Communications for Precision Health: Challenges and Strategies for Reaching All of Us. Among other topics, he will discuss these challenges and strategies:

  • Meeting communities where they are (understanding their needs, concerns around research, meeting their literacy levels, etc.);
  • Widening the definition of precision health and conveying the fact that All of Us is more than a genomics program;
  • Ethics and logistics of targeting with marketing analytics; and
  • Balancing the promise, with the hype and vision, with the need for patience.

As director of All of Us, Dishman leads the agency’s efforts to build a national research program of one million or more US participants to advance precision medicine. Previously, he was an Intel fellow and vice president of the Health and Life Sciences Group at Intel Corporation, where he was responsible for driving global strategy, research and development, product and platform development, and policy initiatives for health and life science solutions. His organization focused on growth opportunities for Intel in health information technology, genomics and personalized medicine, consumer wellness, and care coordination technologies.

Dishman is widely recognized as a global leader in health care innovation with specific expertise in home and community-based technologies and services for chronic disease management and independent living. Trained as a social scientist, he is known for pioneering innovation techniques that incorporate anthropology, ethnography, and other social science methods into the development of new technologies. He also brings a unique personal perspective, as a cancer patient for 23 years and finally cured thanks to precision medicine, to drive a person-centric view of health care transformation.

“Eric Dishman is the perfect speaker at the perfect time,” noted NLM Director Patricia Flatley Brennan, RN, PhD. “His message about the power of people to advance scientific discovery is a strong one. Also, as was announced last year, NIH’s All of Us Research Program and NLM are teaming up to raise awareness about this landmark effort to advance precision medicine. As our colleagues at the Medical Library Association know so well,” she continued, “libraries serve as vital community hubs. NLM’s collaboration with All of Us presents a perfect opportunity to help the public understand how health research impacts all of us. By pairing our National Network of Libraries of Medicine members with public libraries to reach local communities, we hope to contribute to medical breakthroughs that may lead to more tailored disease prevention and treatment solutions for generations to come.”

The Joseph Leiter NLM/MLA Lecture was established in 1983 to stimulate intellectual liaison between the MLA and the NLM. Leiter was a major contributor in cancer research at the National Cancer Institute and a leader at NLM as a champion of medical librarians and an informatics pioneer. He served as NLM Associate Director for Library Operations from 1965 to 1983.

Categories: Data Science

Librarians and Big Data: Should We Be Involved?

MCR Data Science - Thu, 2018-04-12 13:40

Written by: Caroline Marshall, MLS, AHIP, Senior Medical Librarian, Public Services, Cedars-Sinai Medical Library, Los Angeles, CA

There is a great deal of discussion about Big Data. We all think other people are doing it, we think we should be doing it, but we are not sure how to get involved (Tattersall & Grant, 2016).

There have been Calls to Action (Martin, 2016) about Big Data and an affirmation in several studies that librarians should get involved. It is almost as if we are going to miss the Big Data train if we don’t jump on board right away. Big Data is not going away but we, as librarians, need to ascertain how involved we can get depending on staffing and time.

Librarian skills for Big data have been identified more or less along the following bullet points

  • Information Curation
  • In-Depth research
  • Digital Scanning, Preservation
  • Cloud Data Expansion
  • Data Visualization
  • Collaboration, Teaching and Facilitation

Librarians are no strangers to Big Data and we often use these skills already; we use usage data in journal evaluation and renewals. We look at interlibrary loan data to ascertain how quickly we are turning requests around and as an indication of what journals we should purchase. We work with medical staff on citation management software teaching them how to manage, organize and share large quantities of citations for their publications. Librarians perform information curation such as creating digital archives and assigning metadata that will provide access points or cataloging different types of materials for easy retrieval. In-depth searching is something most of us do every day, defining the question or query to retrieve data is a common skill for many librarians.

Learning other skills such as Data Visualization, especially for some librarians who are mid-career, will mean outside workshops (Burton & Lyon, 2017) that will take away from our “regular” work and there is also the question of whether leadership will want to take us in this direction.

Burton & Lyon (2017) suggests librarians should be ‘Data Savvy’ but this is not a skill that can be taught. We cannot push roles onto staff that do not have the knowledge or the desire. Future Masters of Library Science Programs can incorporate more specific courses to create the data scientist librarian that can be part of the research team, but how will this look? How many projects can one person be embedded especially in an institution that has multiple research projects ongoing? Will that librarian be part of the library or employed by the research team?  

I see the librarian’s role not as being embedded in a research team but more in a collaborative, instructional, and facilitation role. This includes teaching classes on statistical or visualization software, and giving guidance on designing the query or on the creation of a database that will need to answer not just the immediate queries, but other queries that the researcher may not have thought of that may come up in the future. We can also identify data repositories that researchers can use that are in our own institutions but that are not gathered in any one place or provide advice on digitization and preservation. We can act as sounding boards in a more consultative manner as opposed to just classes.  

We cannot do everything and we need to be aware of staff, skills and time. Some of us are just getting our toes wet offering classes and so forth, but before scaling up to an institutional level we need to ascertain what we can offer and support.


Burton, M., & Lyon, L. (2017). Data Science in Libraries. Research Data and Preservation (RDAP) Review. Bulletin of the Association for Information Science and Technology. . Bulletin of the Association for Information Science and Technology, 43(4), 33-35.

Martin, E. R. (2016). The Role of the Librarian in Data Science. a Call to Action. Journal of eScience Librarianship, 4(2), E1092.

Tattersall, A., & Grant, M. J. (2016). Big Data – What is it and why it matters. Health Info Libr J, 33(2), 89-91. doi:10.1111/hir.12147


Categories: Data Science