National Network of Libraries of Medicine
English Arabic Chinese (Simplified) French Hindi Japanese Korean Persian Portuguese Russian Spanish

Data Science

Data, Data Everywhere and Not a Drop to Drink

PNR Data Science - Fri, 2018-04-20 05:00

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Jeff Durham, Medical Librarian, Desert Regional Medical Center, Palm Springs, CA

We swim in a sea of information; more often than not we are drowning in it. When a person is presented with a smorgasbord of data, how do we determine what we should eat? This is the current situation with regards to big data and healthcare. What data should be utilized and how. It is in this data-centric meal that the data-savvy health science librarian should be most at home: as critic, guide, and chef.

As health science librarians, we have a responsibility to not only provide the communities that we serve with access to up-to-date and accurate information, but also must be available to enable and facilitate the informational needs of researchers in our communities. With the tremendous amount of big data that is generated on a daily basis, health science librarians have a duty to become involved and assist all of their patrons, both lay and professional, to access, extract, and manage the data (both big and small) that they need.

There are barriers to making a librarian into a data-savvy librarian who can tackle big data problems with ease. One barrier is that many graduate schools in library and information science have not been as keen to teach data science in a general education format, preferring to see it more as a sub-specialty. This occurs ironically enough in iSchools as well. While there is a growing trend to change this educational oversight, it is not the dominant paradigm yet. Another barrier is that of opportunity. All too often, the librarian simply does not have the time or their employer does not provide the means (e.g. time off, reimbursement) for the librarian to refresh their skill set. Until library managers and directors see the value of continuing education of the librarians on their staff on how to use data science and work with big data, the health sciences librarian will continue to fall behind.

There are also opportunities to be found. In hospitals and health science libraries, with residents and medical students, there are lots of in-roads for librarians to make. Given the exponential growth in big data that biomedical devices and the prevalence of smart devices which are constantly generating both passive and active data there is a lot of big data to utilize. The data that is being produced has the potential to be used in research projects for students, residents, nurses, and doctors on staff. There is a significant gap between the abilities of these medical professionals and that of data science. The role of the data-savvy librarian is to be a bridge between these gaps. The data-savvy librarian is able to assist their patrons in identify the datasets that they need as well as demonstrating how to wrangle, clean and visualize their data. By doing this, the librarian provides an essential role in the medical field. It is through the management of big data and assisting the researcher with working with the data and discerning patterns and trends that the librarian enables the student, nurse, or clinician to make evidence-based decisions on the data. By doing so, the librarian assists not only the informational needs of the researchers, but also has a very real impact on patient care.

Categories: Data Science

NNLM “All of Us” National Program Launches May 6: You Can Get Involved!

PSR Data Science - Wed, 2018-04-18 19:41
All of Us Research Program

The National Network of Libraries of Medicine (NNLM) is excited to announce the official launch of the NIH All of Us Research Program on Sunday, May 6, 2018! This national event will be held in seven local communities throughout the United States and will be broadcast via this website and on Facebook Live.

The All of Us Research Program is a historic effort to gather data from one million or more people living in the United States to accelerate research and improve health. Program goals are to develop a more effective way to treat disease by considering individual differences in lifestyle, environment and biology. This initiative comes from the key element from the Precision Medicine Initiative.

Additional information about this Program is available through the Precision Medicine – All of Us Research Program website. Program information is available for download in English and Spanish. NNLM Network Members can learn about involvement opportunities at a one-hour webinar on April 30th at 11:00am PDT.

Categories: Data Science

Reflections on Big Data in Healthcare: Exploring Emerging Roles

PNR Data Science - Mon, 2018-04-16 17:55

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Kathleen Carlson, Education Librarian, College of Medicine Phoenix, University of Arizona, Phoenix, AZ

It is essential for the future of medical librarians to get involved in Big Data. Much of our future work will be coming from big data research projects, especially librarians that work in hospitals and health care systems. Since librarians were early adopters of technology, we were able to move from print indexes to searching indexes on CD-ROMs that were eventually moved to the Web. Moving from the card catalogue to integrated automated library systems, librarians understand how important it is to move forward with Big Data. Many of the older, experienced librarians may not have the expertise or training in the fields of math, computational skills, statistics and domain expertise but we know that our profession should be part of our institutions Big Data team and at least have a seat at the table.

I know that being an Assistant Professor of Practice in the Department of Biomedical Informatics (BMI) at my academic institution, has allowed me to understand and speak the language of Big Data. Clinicians will come to me for resources and journal articles and I have learned a lot by attending monthly journal club meetings on different subjects of Biomedical Informatics and Big Data. BMI fellows, Chief Medical Information Officers (CMIO,) Chief Nursing Officers (CNO) of area hospitals, and BMI faculty attend the sessions. Here I have an opportunity to be seen and be heard and ask questions when they arise as a non-clinician. We have covered the following topics of Big Data and Informatics in the past three years:

  • Cybersecurity
  • Data Standards
  • Health Literacy
  • Electronic Health Record/Electronic Patient Record
  • Process Oriented Health Information Systems
  • Clinical Decision Support Systems
  • Graphic Display and Visualization
  • Health information Exchange
  • Cloud Computing Services
  • Substitutable Medical Applications and Reusable Technologies (SMART)
  • Fast Health Interoperability Resources (FHIR)

I also attend monthly Clinical Informatics Grand Rounds. The speakers vary from clinicians to researchers, MBA, Pharmacy and Public Health faculty.

So, for the past three years I have had a seat at the table and have given our library visibility within Biomedical Informatics and Big Data. I also believe that a medical librarian at any institution should find a champion or champions that will assist him/her in getting a seat at the table. And when that is accomplished, a hospital librarian should get permission to embed at least one vetted  link that is appropriate to a patient’s electronic record with,  National Institute on Aging, or another consumer health oriented resource. This would relieve the burden on clinicians in finding the best resource for patient care.

Big Data can be organized, appraised, secured, preserved with a librarian’s help and can assist researchers and clinicians in patient care and help find areas that may need improvement. Creating an online resource guide with Big Data tools and resources can be a first step into marketing the librarian and library. The NNLM PSR had recently recruited a data and technology services coordinator. She asked librarians if they collected any data for their institution. Unfortunately, we are considered a satellite campus of a large Research One University. I think there are areas at my institution where data is collected but could be used more effectively. I know within the Scholarly Project, a four-year mandatory thesis and poster at our institution, many of our students use Big Data from area hospitals or the state’s data archives to have foundational information in their presentations and theses. They are assisted by their clinical mentors.

I also like one of my fellow course student’s discussion post about teaching himself ‘R’ so he is able to teach classes to the data scientists on his campus. Finding resources for Big Data programming language and free software for statistical computing and graphics software like ‘R’ and can help the librarian be an informational resource for Big Data collection. This instruction example is one-way librarians will have to get out of your comfort zone and put themselves out there for Big Data. We have access to SPSS and STATA in our library commons. I took three classes on RedCap to help me understand Big Data and how to collect it safely and securely. REDCap is a secure web application for building and managing online surveys and databases and collecting data.

The librarian can be the go-to resource for students and researchers and help them search the archives of stored Big Data sets. I do not believe that our small campus has the capacity to store Big Data and it is not something that the larger academic institution is willing to duplicate. I do believe that as a librarian being visible and attending committee meetings, journal clubs, clinical informatic rounds and actually showing an interest in learning about Big Data gives a librarian the knowledge and vocabulary to understand and share with her constituents. The librarian can also familiarize himself/herself with websites that assist in Big Data knowledge similar to the Institute for Health Metrics and Evaluation which I learned about in the course discussions.

Categories: Data Science

Reflections on Big Data in Healthcare: Exploring Emerging Roles

GMR Data Science - Mon, 2018-04-16 09:35

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Nicole Montgomery, MISLT, AHIP, Librarian, Assistant Professor, CoxHealth Systems and Cox College, Springfield, MO

I am certain that Health Sciences Librarians should be involved with anything healthcare. This is our job.

I have often teased that we are the bartenders of our institutions. We have a seat in the organization that is unique to any other in that it allows us to interact with everybody. Literally, everybody! From the person who cleans the library, to the CEO of the hospital, or the people who work in financial services, the nurse on the floor, an occupational therapy student, a patient who just learned her baby will be staying in the NICU, or a physician trying to determine the best treatment for a difficult case. We hear people’s stories; we hear their frustrations and sometimes lend an ear when they need one. Librarians are intrinsically user-focused.

We typically get to know our users, and we are able to see the overall picture of the information they are seeking. Because of our familiarity with our users, if a physician needs insight into a nutrition-related topic, I am in a position to know which dietician on staff will likely be able and willing to answer his questions. Or, when the college I work with decides to investigate some cool 3-D equipment, I am able to suggest collaborating with the hospital’s residency program to share the cost and make the most of using the equipment. The real-life examples are endless, but ultimately, we desire to bridge the gap between departments, disciplines and people with like-interests; because we know that working together is usually better than staying in our silos.

What I am not certain of, is to what level we should be involved with big data initiatives. In the light of Big Data, I believe most librarians still have a lot to learn about our organizations before we may answer the question about our level of involvement. I imagine we will all find different answers.

In conjunction to exploring our institutions, I think librarians need to begin discussions in an attempt to answer how Big Data may impact libraries. We need to ask ourselves questions about the future such as: will we still have print books, current journals and stacks of bound serials? Will libraries still exist as brick and mortar buildings? Will all of our materials be delivered electronically? Will the librarian simply become a person behind a computer screen? Will our profession become a fond memory of the past, just like the card catalog? What will the entire publishing industry look like? Krumholz briefly addresses the question about the publishing industry on p. 1169 of his article by saying, “In the future, the products of scientific inquiry may evolve from a static journal publication to a more dynamic platform for presenting and updating results.” Brennan predicts the same at 1:10:21 of her presentation. She says (with an apology to any journal editors), “We’re moving pretty quickly away from journal articles and pretty fast into blogs…and shared knowledge building. In health sciences, the “bread and butter” of our world is journal articles. While we, as librarians, typically pride ourselves on being willing to embrace technology, I think the inception of Big Data into our world may challenge us and may change our profession in a way we cannot yet imagine.

In an effort to give us a place to begin, librarian Elaine R. Martin provides a proposed “Data Management Framework for Librarians.” She says her proposed framework is user-centered and includes five “buckets”: Data Services, Data Management Practices, Data Literacy, Archives/Preservation, and Data Policy. Without delving into explaining each “bucket” within this essay, it is easy to say that each proposed bucket provides familiar concepts to librarians. For instance, the Data Services bucket, “…may include the following activities: assessing researcher needs, performing an institutional data environmental scan, conducting the research interview, designing a suite of services such as assistance with DMPs [Data Management Practices] based on user needs, etc.” These concepts are digestible for librarians and definitely provide us with a place to start.

While my parallel of being the bartenders of our institutions is intended to be humorous, there is quite a bit of truth to this. No matter what changes the future holds, as librarians, we will instinctively do our part.


  1. Krumholz, HM. Big Data And New Knowledge In Medicine: The Thinking, Training, And Tools Needed For A Learning Health System
  2. Brennan, Patti. NINR Big Data Boot Camp Part 4: Big Data in Nursing Research
  3. Martin, Elaine R. The Role of Librarians in Data Science: A Call to Action
Categories: Data Science

Eric Dishman to Deliver 2018 NLM/MLA Leiter Lecture Videocast on May 9

PSR Data Science - Thu, 2018-04-12 15:57
Eric DishmanEric Dishman

Eric Dishman, director of the All of Us Research Program at the National Institutes of Health, will deliver the 2018 Joseph Leiter National Library of Medicine/Medical Library Association Lecture on Wednesday, May 9, at 10:30 AM PDT, in the Lister Hill Auditorium on the NIH Campus. The lecture is open to the public. It will be broadcast live on the Web (and later archived) at: The featured presentation will be Precision Communications for Precision Health: Challenges and Strategies for Reaching All of Us. Among other topics, he will discuss these challenges and strategies:

  • Meeting communities where they are (understanding their needs, concerns around research, meeting their literacy levels, etc.);
  • Widening the definition of precision health and conveying the fact that All of Us is more than a genomics program;
  • Ethics and logistics of targeting with marketing analytics; and
  • Balancing the promise, with the hype and vision, with the need for patience.

As director of All of Us, Dishman leads the agency’s efforts to build a national research program of one million or more US participants to advance precision medicine. Previously, he was an Intel fellow and vice president of the Health and Life Sciences Group at Intel Corporation, where he was responsible for driving global strategy, research and development, product and platform development, and policy initiatives for health and life science solutions. His organization focused on growth opportunities for Intel in health information technology, genomics and personalized medicine, consumer wellness, and care coordination technologies.

Dishman is widely recognized as a global leader in health care innovation with specific expertise in home and community-based technologies and services for chronic disease management and independent living. Trained as a social scientist, he is known for pioneering innovation techniques that incorporate anthropology, ethnography, and other social science methods into the development of new technologies. He also brings a unique personal perspective, as a cancer patient for 23 years and finally cured thanks to precision medicine, to drive a person-centric view of health care transformation.

“Eric Dishman is the perfect speaker at the perfect time,” noted NLM Director Patricia Flatley Brennan, RN, PhD. “His message about the power of people to advance scientific discovery is a strong one. Also, as was announced last year, NIH’s All of Us Research Program and NLM are teaming up to raise awareness about this landmark effort to advance precision medicine. As our colleagues at the Medical Library Association know so well,” she continued, “libraries serve as vital community hubs. NLM’s collaboration with All of Us presents a perfect opportunity to help the public understand how health research impacts all of us. By pairing our National Network of Libraries of Medicine members with public libraries to reach local communities, we hope to contribute to medical breakthroughs that may lead to more tailored disease prevention and treatment solutions for generations to come.”

The Joseph Leiter NLM/MLA Lecture was established in 1983 to stimulate intellectual liaison between the MLA and the NLM. Leiter was a major contributor in cancer research at the National Cancer Institute and a leader at NLM as a champion of medical librarians and an informatics pioneer. He served as NLM Associate Director for Library Operations from 1965 to 1983.

Categories: Data Science

Librarians and Big Data: Should We Be Involved?

MCR Data Science - Thu, 2018-04-12 13:40

Written by: Caroline Marshall, MLS, AHIP, Senior Medical Librarian, Public Services, Cedars-Sinai Medical Library, Los Angeles, CA

There is a great deal of discussion about Big Data. We all think other people are doing it, we think we should be doing it, but we are not sure how to get involved (Tattersall & Grant, 2016).

There have been Calls to Action (Martin, 2016) about Big Data and an affirmation in several studies that librarians should get involved. It is almost as if we are going to miss the Big Data train if we don’t jump on board right away. Big Data is not going away but we, as librarians, need to ascertain how involved we can get depending on staffing and time.

Librarian skills for Big data have been identified more or less along the following bullet points

  • Information Curation
  • In-Depth research
  • Digital Scanning, Preservation
  • Cloud Data Expansion
  • Data Visualization
  • Collaboration, Teaching and Facilitation

Librarians are no strangers to Big Data and we often use these skills already; we use usage data in journal evaluation and renewals. We look at interlibrary loan data to ascertain how quickly we are turning requests around and as an indication of what journals we should purchase. We work with medical staff on citation management software teaching them how to manage, organize and share large quantities of citations for their publications. Librarians perform information curation such as creating digital archives and assigning metadata that will provide access points or cataloging different types of materials for easy retrieval. In-depth searching is something most of us do every day, defining the question or query to retrieve data is a common skill for many librarians.

Learning other skills such as Data Visualization, especially for some librarians who are mid-career, will mean outside workshops (Burton & Lyon, 2017) that will take away from our “regular” work and there is also the question of whether leadership will want to take us in this direction.

Burton & Lyon (2017) suggests librarians should be ‘Data Savvy’ but this is not a skill that can be taught. We cannot push roles onto staff that do not have the knowledge or the desire. Future Masters of Library Science Programs can incorporate more specific courses to create the data scientist librarian that can be part of the research team, but how will this look? How many projects can one person be embedded especially in an institution that has multiple research projects ongoing? Will that librarian be part of the library or employed by the research team?  

I see the librarian’s role not as being embedded in a research team but more in a collaborative, instructional, and facilitation role. This includes teaching classes on statistical or visualization software, and giving guidance on designing the query or on the creation of a database that will need to answer not just the immediate queries, but other queries that the researcher may not have thought of that may come up in the future. We can also identify data repositories that researchers can use that are in our own institutions but that are not gathered in any one place or provide advice on digitization and preservation. We can act as sounding boards in a more consultative manner as opposed to just classes.  

We cannot do everything and we need to be aware of staff, skills and time. Some of us are just getting our toes wet offering classes and so forth, but before scaling up to an institutional level we need to ascertain what we can offer and support.


Burton, M., & Lyon, L. (2017). Data Science in Libraries. Research Data and Preservation (RDAP) Review. Bulletin of the Association for Information Science and Technology. . Bulletin of the Association for Information Science and Technology, 43(4), 33-35.

Martin, E. R. (2016). The Role of the Librarian in Data Science. a Call to Action. Journal of eScience Librarianship, 4(2), E1092.

Tattersall, A., & Grant, M. J. (2016). Big Data – What is it and why it matters. Health Info Libr J, 33(2), 89-91. doi:10.1111/hir.12147


Categories: Data Science

Reflections on Big Data in Healthcare: Exploring Emerging Roles

MCR Data Science - Mon, 2018-04-09 12:21

Written by: Niala Dwarika-Bhagat, The Medical Sciences Library, The University of the West Indies, St Augustine, Trinidad and Tobago

Introduction: What is Big Data?

Technopedia defines Big Data as a process “that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying dataData that is unstructured or time sensitive or simply very large cannot be processed by relational database engines.” (Technopedia., 2018)

Over time, the different iterations of Big Data processing and application have been used to reflect, interpret and influence developmental change. The sanctity of this tacit operation has remained largely undisturbed, until the advent of social media. Layers of issues involving social media apps, now suggest that Big Data, in addition to its merits, can be manipulated to alter perception and reality. Notwithstanding the notoriety of current headlines, what is clear is that the Big Data is now a commonplace topic of conversation.

Do you think health sciences librarians should get involved with Big Data in healthcare? 

The health sciences librarian working with Big Data in the academic environment is potentially in a “safer” place away from the glare of mainstream media. And although the librarian has been traditionally and typically constrained by a much larger mandate to provide services and resources for curriculum support, this is set to change with data science featured on university curricula, as well as libraries’ strategic plans. So indeed, health librarians should and would inevitably get involved with big research data in healthcare even if it is to provide basic but essential data services support emanating from medical education.

Where should librarians get involved?

There is great potential for academic health librarians to do data services support. The roles that could be potentially played are:

  • As controlled vocabulary experts (cataloguers and indexers)
  • As systems experts, navigating through ILS and health research data sets
  • As marketers -doing outreach to garner support for data projects
  • As advisors e.g. creating data management plans
  • As trainers, embedded in data science courses
  • As search-experts aiding the discovery of health research data
  • As support for ongoing research projects
  • As programmers writing analyses using code
  • As advocates for the privacy of medical research data
  • As outreach experts, including creating research guides
  • As expert searchers, locating datasets for faculty research

How should librarians get involved?

The health sciences librarian need not become a data scientist but rather work in teams for maximum output and impact. Academic research data will form the crux of this work, operationalizing all of the above. As librarians accept a mandate to work with big research data, with their skills and training they can be the ultimate crucible for data discovery. I envisage their greater role would be as providers of information and trainers using their existing skills and expertise. This would involve activities such as engaging with faculty to harness research data, encourage researchers to deposit their research data into the library repository, collection development, data literacy instruction, creating online resource guides, assist with data management plans, and provide guidance on data tools. With advance training, they can even locate data sets that researchers require. Furthermore librarians with coding/programming skills, can definitely add value to data services support for research in healthcare data. As far as potential roles are concerned, there are a great many and, with time, there would be sophisticated and evolved workflows for the health science librarian.


What is Big Data? – Definition from Techopedia. (n.d.). Retrieved March 28, 2018, from

Categories: Data Science

Reflections on Big Data in Healthcare: Exploring Emerging Roles

SEA Data Science - Thu, 2018-04-05 09:57

Written by: Monica Riley, Serials Librarian, Morehouse School of Medicine, Atlanta, GA

As librarians and information managers, it is our duty to stay abreast of emerging trends and technologies and how they might impact our users. Health Sciences librarians can and should get involved with big data in healthcare, but to what extent I’m not certain. There are many factors that may present challenges in providing data services including staffing, knowledge and expertise, budget, and technology deficiencies. Health Sciences librarians need to think strategically and collaboratively about what type of data services they might be able to provide, and how best to execute them. Recognizing that all aspects of data management require a specific skill set, areas for training and continuing education should be identified, and these newly learned skills put into practice on a consistent basis.

As the trend of big data and research data management becomes more massive, organizations should consider incorporating data services into their strategic plans and developing priority areas with specific timelines. While I don’t think it’s necessary nor always feasible for librarians to become data scientists, at minimum we need to be prepared to answer questions related to big data and point to resources. One thing libraries can do immediately is assemble a big data/data science working group, with a cross-section of staff who can assess the needs at their institution, and contribute unique perspectives and ideas on how best to address those needs. Through these discussions the working group can develop an action plan and establish big data initiatives. Something as simple as creating an online resource guide or other research guide with general information, tools, and resources for big data is a good way to test the waters without getting into the complexities of big data. This could be either a collaborative or individual effort.

Thinking about a user-centered approach as Elaine R. Martin discussed in her article “The Role of Librarians in Data Science: A Call to Action”, identifying user needs is the first step in determining whether or not you can successfully provide data services. Some questions that would need to be addressed are, what type of data services are your users asking for? And who on staff is readily available with the knowledge and expertise to handle those requests? This will most likely involve partnering with other departments, as well as getting valuable feedback from users.

As health sciences librarians attempt to make sense of and define their specific role in big data, for the immediate future I feel our talents would be best suited in a supportive role. Providing consultations and assistance with research data management, bringing awareness to big data resources, and possibly facilitate training on big data tools are just a few ways we can contribute. How we gather, analyze, store, and preserve information is constantly evolving, so should our roles as librarians. Although big data can be very complex, and the idea of assuming these responsibilities can be extremely daunting, I think it’s important for us to remain steadily involved in these discussions. As others may not recognize the skills and value that librarians bring to the table, we need to advocate for ourselves and create opportunities to become a part of this big data movement.

Categories: Data Science

DataFlash: Data Indexers

PNR Data Science - Mon, 2018-04-02 13:50

The Institute for Health Metrics and Evaluation (IHME) is “an independent population health research center at UW Medicine, part of the University of Washington, that provides rigorous and comparable measurement of the world’s most important health problems and evaluates the strategies used to address them.” Their mission is to improve the health of the world’s populations by providing the best information on population health, and to do so, IHME enlists the expertise of countless individuals, including researchers, data analysts, data scientists, and thirteen data indexers. What is a data indexer? Lyla Medeiros, a data indexer at IHME, shares more about her essential role below…

What is a data indexer? And how long have you been in the role?

Data indexers are part of a team responsible for providing librarian services to IHME. Data indexers not only catalog data for inclusion in the Global Health Data Exchange (GHDx), they also organize and maintain data files, provide reference services to IHME researchers, and search for and acquire new data sources. Data indexers are also responsible for creating documentation on cataloging practices, implementing improvements to process and workflows, reporting and testing technical issues that pop up in the GHDx for the Drupal development team, and managing controlled vocabularies and taxonomies, which includes researching and adding terms. I’ve been working as a data indexer for four years and three months.

What is your education/occupational background?

I earned a BA in Dance Studies and Art History at the State University of New York, Empire State College and a Masters of Library Science at Indiana University, Bloomington. Before becoming a librarian, I trained to become a classical ballet dancer and teacher. I’ve taught ballet in New York, New Mexico and here in Washington.

Who do you work with at IHME?

Outside of the data services team, I work with public health researchers, data analysts, Drupal developers, and student assistants.

IHME US Map Data Visualization

IHME US Map Data Visualization

What types of data do you work with?

The data that IHME uses to create global health estimates comes in data file formats like .dta, .dbf, .sav, and Excel tables, Word documents, text files, .pdf documents and Access databases. When necessary, we digitize books and sometimes even microfiche. Right now, I primarily catalog health and demographic survey datasets and their related geospatial data. In the past, I’ve also worked on cataloging health statistics reports, epidemiological surveillance, and serial publications. Some other types of data we collect and catalog include vital registration, hospital discharges, censuses, disease registries and government health budgets.

What do you enjoy most about your job?

I most enjoy the variety of work. For example, today I did research on stroke in order to create new keywords and planned out how to retroactively apply the new keywords to existing records, searched for and cataloged new survey data, contacted a survey provider about missing variables in a data file, and worked on a presentation I’ll be giving to on our keyword taxonomy.

What advice would you give other librarians interested in working with data/in the field of data librarianship? 

I am forever thankful for the classes I took in graduate school that focused on representation and organization, metadata and semantics, indexing, creating ontologies in RDF/RDFs (Resource Description Framework/Resource Description Framework Schema) and cataloging in XML. Those classes provided me with a solid foundation for the type of work I do as a data indexer.

I would like to sincerely thank Lyla for providing us with insight into a librarian role that is quite unique, and quite essential. If you would like to learn more about IHME, the GHDx, and many of their ground breaking projects and visualizations, please visit

Categories: Data Science

Data Flash: Exploring Historical Data

PNR Data Science - Tue, 2018-03-27 04:00

It’s so easy to think of data as a modern phenomenon, that we forget that data analysis and data visualization are phenomena which go way back.  A marvelous example is John Graunt’s Bills of Mortality, which this post by John Appleby calls “a 17th century spreadsheet of deaths in London”.  Appleby goes on to do some ultra-modern visualizations of the data, which illuminate connections that Graunt probably didn’t make at the time scientifically, but may have understood intuitively.

If you find this concept intriguing, either to read about or to explore more directly, consider taking a mosey through the series from the National Library of Medicine’s Historical Division, “Revealing Data: Explorations of Data in Collections”.  They are making historical research data available now, across many health-related fields, and in fascinating ways!   They have data in a wide range of formats, informally and formally collected, quantitative and qualitative.   And among their treasures is—you guessed it—a copy of Graunt’s magnum opus, and a post about it!

Also, if data analysis is your thing, there are many sources of data sets out there, particularly from the federal government.  Check out descriptions of what’s in the National Archives, or historical patent data, or that classic, historical census data, and others at   Additionally, you might want to explore data from other sources, such as from the Pew Research Center (it doesn’t go back to the 1600s, but it’s something!), and historical GIS data from the American Association of Geographers.

Enjoy your explorations into the past—which may end up transforming our future!

Categories: Data Science

NLM’s 2017-2027 Strategic Plan

PNR Data Science - Fri, 2018-03-16 15:56

“Every day more than four million people use NLM resources; every hour, a petabyte of data moves in or out of our computing systems.”

The National Library of Medicine Board of Regents has released the Strategic Plan for 2017-2027: “A Platform for Biomedical Discovery and Data-Powered Health.” Working in conjunction with NLM planning staff, the Board identified themes to use as a framework to develop future priorities and directions. Public input was solicited on these themes:

  • Advancing data science, open science, and biomedical informatics
  • Advancing biomedical discovery and translational science
  • Supporting the public’s health: clinical systems, public health systems
    and services, and personal health
  • Building collections to support discovery and health in the 21st century

In addition, the following topics were considered across the four themes: partnerships, user communities, user engagement and educational outreach, international engagement,health disparities, standards, infrastructure, workforce development, research needs and funding.

The Strategic Plan introduces three major goals:

  1. Accelerate discovery and advance health by providing the tools for data-driven research.
  2. Reach more people in more ways through enhanced dissemination and engagement pathways.
  3. Build a workforce for data-driven research and health.

From Dr. Patricia Brennan’s introduction:

  • Data-driven discovery requires sophisticated library and information science to open the door to thrilling new prospects for improving the public health, as well as informatics and data science to deliver insights and solutions.
  • The migration of clinical care from hospital to home challenges NLM to reach into these places where health occurs, not just where care is delivered.
  • Governmental and scientific forces are aligning under a philosophy that innovation is accelerated if data flow freely, that the results of government-sponsored research should be open to the public as quickly as possible, and that linking scientists, citizens, and industry yields social benefits.
  • Libraries continue to be essential places for knowledge repositories and community gathering, yet the advent of self-directed search, e-publishing and consolidation of hospital library services challenges librarians and libraries to devise new services and solutions.

Your comments on the Strategic Plan are welcome. Please send to: NLM Director’s Office (

Categories: Data Science

Request for Information: Submit Comments on the NIH Draft Strategic Plan for Data Science by April 2!

PSR Data Science - Wed, 2018-03-07 16:24

In order to capitalize on the opportunities presented by advances in data science, the National Institutes of Health (NIH) is developing a Strategic Plan for Data Science. This plan describes NIH’s overarching goals, strategic objectives, and implementation tactics for promoting the modernization of the NIH-funded biomedical data science ecosystem. As part of the planning process, NIH has published a Request for Information (RFI) to seek input from stakeholders, including members of the scientific community, academic institutions, the private sector, health professionals, professional societies, advocacy groups, patient communities, as well as other interested members of the public.

The NIH seeks comments on any of the following topics:

  • The appropriateness of the goals of the plan and of the strategies and implementation tactics proposed to achieve them;
  • Opportunities for NIH to partner in achieving these goals;
  • Additional concepts that should be included in the plan;
  • Performance measures and milestones that could be used to gauge the success of elements of the plan and inform course corrections;
  • Any other topic the respondent feels is relevant for NIH to consider in developing this strategic plan.

Responses to this RFI must be submitted electronically by April 2, 2018.

Categories: Data Science

NLM Releases 2017-2027 Strategic Plan!

PSR Data Science - Thu, 2018-03-01 15:02
cover page of the NLM strategic plan 2017-2027

The National Library of Medicine 2017-2027 Strategic Plan, A Platform for Biomedical Discovery and Data-Powered Health is now accessible online. The web site also offers readers the opportunity to submit any thoughts and/or reactions about the plan, if desired. NLM’s future is being built on three pillars:

  • NLM as a platform for data-driven discovery and health
  • Reaching new users in new ways
  • Workforce excellence from citizens to scientists

Highlights of some initial NLM implementation activities include the following:

  1. A strong, robust platform for delivering NLM’s literature and data resources
  2. Energizing the NLM research agenda to meet a data-rich future
  3. Alignment of NLM outreach efforts
  4. Stabilizing NLM audiovisual support services
  5. Improved deposit, curation, and discovery services

Check out the plan and submit your feedback!

Categories: Data Science

Save the Date: Midwest Data Librarian Symposium (MDLS) 2018

GMR Data Science - Wed, 2018-02-28 10:51

SAVE THE DATE! The Iowa State University Library in Ames, IA will be hosting the 2018 Midwest Data Librarian Symposium (MDLS) on October 8-9, 2018.

MDLS 2018 is intended to provide Midwestern librarians who support research data management the chance to network and expand their research data-related knowledge base and skill sets. It is open to all who wish to attend, including those from the Midwest and beyond as well as librarians in training.

Attendance to this event is capped and decided on a first-come, first-served basis. Stay tuned for more announcements, follow @MW_DataLibSym on Twitter, or check the MDLS webpage for updates as they become available

Questions should be directed to

Categories: Data Science