National Network of Libraries of Medicine
English Arabic Chinese (Simplified) French Hindi Japanese Korean Persian Portuguese Russian Spanish

PNR Data Science

Subscribe to PNR Data Science feed
News from the Northwest and Beyond
Updated: 1 hour 55 min ago

Data Flash: “Storage Wars”

Wed, 2018-07-11 05:00

You may have seen the feature on the front page of our website, “Where in the World are the PNR Coordinators?” But, we don’t always report back on our travels!  So, here is a quick view of a conference I attended on behalf of the NNLM-PNR, that took place in Bozeman, MT last month, called “Open Repositories 2018”.  What is an open repository?  I like this definition from the “Repositories Support Project”:

“A digital repository is a mechanism for managing and storing digital content. Repositories can be subject or institutional in their focus. Putting content into an institutional repository enables staff and institutions to manage and preserve it, and therefore derive maximum value from it… Repositories use open standards to ensure that the content they contain is accessible in that it can be searched and retrieved for later use.”

I don’t work with repositories directly, so this conference was basically like drinking water from a fire hose.  The attendees were a mix of librarians/library staff and people from the IT side of running repositories, meaning that my comprehension of a given session could range from about 5% (for the very techie ones) to 100%.  And that was fine—I got a great introduction to the issues involved in starting and running repositories, and learned about some new trends, some areas of conflict and some growing pains (hence the title of this post).  For example, take a look at this presentation by Peter Sefton.  I pretty much understand the whole section above the picture of the boat, and then an average of about 65% of what’s below it; that feels worth it to me!   It was an international conference, so the perspective on how repositories are handled was global.   I would never otherwise have heard of Australian Sefton’s work, or been able to attend a session on the Digital Repository of Ireland.   I even got to spend a full day attending two workshops on Wikidata and Wikipedia editing (did I mention that the NNLM’s next Online Wikipedia Edit-a-Thon is November 7 this year?).

And, one great thing about open conferences and all things open is that you can often gather the content for yourself after the conference even if you didn’t attend it.  Here are some options if you want more information about what happened at this conference:

YouTube stream of everything held in the main session space (including the Digital Repository of Ireland presentation)

Notes from sessions

The program

— Social media: Twitter= @OR2018MT, Instagram= @openrepositories18

I leave you with three photos from the experience.  One is of me with my poster highlighting three of the National Library of Medicine’s eight data sharing repositories:, PubChem and GenBank.  And the other two are from my visit to the Museum of the Rockies, which features the most amazing dinosaur exhibit I’ve ever seen, and a thing I love—a historic house which was moved to the museum site, furnished appropriately to the period in which it was built, and staffed by costumed and knowledgeable living history interpreters.

Categories: Data Science

DataFlash: Staying Informed

Wed, 2018-06-27 17:38

Network Big data and research data management are evolving quickly and it can be challenging to keep up with developments in the field. Social media is a great way to keep track and to ask questions of colleagues, researchers, and vendors. Below are several links worth checking out…

CANLIB-DATA is a listserv for issues related to research data in Canadian libraries, with more than 350 subscribers.

DataCure “is a Google group of librarians and information professionals whose members have significant roles or responsibilities in providing services in managing or curating research data. Datacure exists to provide a safe space for data professionals to talk frankly about their ideas, projects, successes, and struggles with their work.”1

Datalibs distribution list is intended to serve as both a bulletin board for news, upcoming events, and continuing education/job opportunities as well as a forum that librarians can use to post questions or to initiate and engage in discussions. Join via the Journal of eScience Librarianship website.

IASST-L  The International Association for Social Science Information Services and Technology (IASSIST) is an international organization of professionals working with information technology and data services to support research and teaching in the social sciences. Join IAssist ($50 USD annually) to access their organization’s email discussion list IASST-L.

MLA Data-SIG is the Medical Library Association’s data related special interest group. Membership in the MLA is required to access the SIG list serv.

@NNLM_RD3 is the NNLM RD3: Resources for Data-Driven Discovery website’s Twitter feed. When tweeting, use the #datalibs hashtag to reach out to other data librarians.

RDAP or the Research Data Access & Preservation Summit is relevant to the interests of data managers, data curators, librarians working with research data, and researchers and data scientists. RDAP is currently in transition and has moved its listserv to a new server. RDAP’s new e-mail address may be the best place to inquire about further developments.

RESEARCH-DATAMAN is an email discussion list for United Kingdom education and research communities.

The data science departments on your own campus may also host listservs, Twitter sites, Facebook pages, or blogs. The University of Washington’s eScience Institute is just one example of the data related centers available near the PNR’s home base. If you know of additional data related listservs, Google Groups, or Twitter sites, share them with your colleagues by entering them in the comments section below.


1  Barbrow S, Brush D and Goldman J. (2017). Research data management and services: Resources for novice data librarians; ACRL College and Research Libraries News, 78(5)

Categories: Data Science

June PNR Rendezvous webinar next week

Wed, 2018-06-13 07:41

The next PNR Rendezvous monthly webinar is coming up.

Session title: Unlocking the Potential of De-identified Clinical Datasets

Presenter: Bas de Veer, Bio-Medical Informatics Services Manager for UW Medicine IT Services

When: Wednesday, June 20 starting at 1:00pm PT, Noon Alaska Time, 2:00pm MT

Healthcare systems generate a ton of data on a daily basis. The primary purpose of this data is billing and clinical decision making. But great secondary use of this data is research. This webinar will discuss the potential uses, best practices and common hurdles of de-identified clinical datasets.

Registration is encouraged but not required. However, attending the live session will allow for questions. The session will be recorded and posted on the PNR Rendezvous web page a few days after the live session.

Medical Library Association CE credit is available for both the live and the recorded session.

More information about how to join the session is available on the PNR Rendezvous webpage.

Categories: Data Science

Data Flash: What is this GDPR thing I keep hearing about?

Fri, 2018-05-25 20:44

May 25 begins the era of the GDPR, or, General Data Protection Regulation, a new European program with strong enforcement provisions which sets data protection as a default rather than requiring users to opt-out of entities being allowed to use their data (to put it VERY simplistically).  Why should we in the U.S. pay any attention to something applying to European data?  Well….

–The coming tide of companies, governments, and others using and combining and potentially misusing our personal data is no longer a swell, it’s a tsunami (says Tom Wheeler of the Brookings Institute).  The time to act is now, and Europe’s action will have ripple effects.  So it’s a good thing to be aware of the GDPR because something like it will be in our lives eventually (even if not coming soon to a theater near you).

–Many large American companies are already global anyway, and they are having to respond due to their European presence.  Facebook and Apple are two examples.

–Even many smaller U.S. companies and organizations, though they don’t have to protect your data under the GDPR, are proactively notifying you that they are taking steps to do so (you’ve probably seen a lot of these notices in your inbox recently–The New York Times suggests you read them).

–Last but not least, it’s a fascinating new conceptualization of our entitlements as online beings!  The GDPR arguably “enshrines data protection as a fundamental human right“.  It moves the discussion about data and our privacy as individuals WAAAAAY forward and in new directions.

This article, from Vox, puts it well: “…Norms are shifting once more. Looking back, we can frame the development of digital behavior into three phases: First, there was a naiveté phase, where consumers didn’t really understand the technology and what it meant. Then there was the careless phase, where people saw data rights or privacy as either unimportant or an acceptable price of entry to all the good, free stuff. Now it is clear we are entering the demand phase, which sees the emergence of a more savvy, engaged, and alarmed digital consumer — and related movements to create and enforce consumer rights.”

Watch this space–and all of your online presences–for further developments!


Categories: Data Science

A Unique Data Experience: Reflections on ESFCOM’s Inaugural Hackathon

Mon, 2018-05-07 05:00

Today’s blog is by Nancy Shin, Sewell Memorial Fund Librarian Fellow at Washington State University’s Elson S. Floyd College of Medicine. Welcome, Nancy!

The most extraordinary thing happened to Washington State University’s Elson S. Floyd College of Medicine (ESFCOM) the weekend of April 13 -15, 2018.  ESFCOM hosted its inaugural Hackathon, which was organized by the College Technology Incubator Officer, Andrew Richards.  It was well attended by people from all walks of life and subject expertise including students and healthcare providers.  So, the big question is what exactly is a hackathon and why all the hype?

A hackathon is a social event that is focused on building small, innovative, and new technology projects.  It brings together teams of people to work on a common project within an overarching theme; at the end of the event, teams formally present their projects for judging.  The hackathon can last from 4 hours to 1 week (sleep is optional) and can involve large cash purses as prizes.  Typically, projects are technological and can result in the development of a new app or feature on a website in response to a theme; in the case of ESFCOM’s Hackathon, the theme was “challenges in rural healthcare.” The common misconception about a hackathon is that it is an event that is strictly designed for computer programmers, engineers, and software developers – i.e. anyone who codes! However, other skills like research, design, project management, data management, and leadership are also important to the dynamic of an ideal hackathon team.

Arguably the first hackathon was hosted by OpenBSD in 1999, which is an operating system; ten developers came together to work on various software problems over the span of a week (Davis, 2016).  Since then, hackathons have more famously been hosted by various companies like Facebook and Yahoo in 2005 and 2006, respectively, in order to catalyze new innovations in a relatively “risk-free” and “creative” environment (Davis, 2016). In general, hackathons are organized by one of the following communities: open source software companies, tech companies, sponsored competitions, and community institutions (Davis, 2016).

PTme, the winning team

In the health field, a big community hackathon organizer is the National Institutes of Health (NIH) which often hosts hackathons with a bioinformatics theme.  Although the ESFCOM’s Hackathon was heavily inspired by MIT’s “Grand Hack & Hacking Medicine,” what makes the community hackathon at ESFCOM so different and unique from other health hackathons is that it encourages a diverse skillset to tackle healthcare problems.  For example, the winning team PTme was made up of a diverse skillset that included developers, medical students, business leaders, and engineering students while my own hackathon team was made up of a mathematician, bioengineer, computer engineer, designer, and health/data librarian.  Another unique feature of the ESFCOM’s Hackathon was the involvement of health librarians in the Spokane area in creating a “Research Station” that provided active research and data management for the participating teams.  The volunteer librarians were able to provide direct research support to assist with each team’s research and data management needs.  It is those two qualities, skill diversity and library support, which makes ESFCOM’s Hackathon one of a kind and a successful model for other health communities/organizations to follow for their future hackathons!


Davis, R. C. (2016). Hackathons for libraries and librarians. Behavioral & Social Sciences Librarian, 35(2), 87-91. doi:10.1080/01639269.2016.1208561

Categories: Data Science

Do You Think Health Sciences Librarians Should Get Involved with Big Data in Healthcare?

Fri, 2018-04-27 05:00

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by Sara Pimental, Senior Consultant, Kaiser Permanente, San Francisco, CA

My answer to this question is a qualified yes. However, librarians don’t HAVE to get involved to be successful. I think people panic when they think if they don’t get involved in every aspect of new trends in librarianship they will become obsolete. There are many ways to evolve; big data is just one of them.

Since I am involved in one aspect of utilizing Big Data, I would have to say yes, librarians who have the interest, should get their hands dirty. I can see skills that all librarians possess being useful in all aspects of BIG Data. For those more technically inclined, they should go all the way and become data scientists. Many us use have learned programing languages and other similar tasks and could do very well in this area.

For those of us who have no desire to become so technical but have a curious fondness for metadata there are many niches for that type of person. This is where I have landed. I assist not just with taxonomy and metadata for my website but also with linking structured data from the EHR with clinical information available on the website and soon with subscribed third party. I could envision a librarian’s talents also being useful with unstructured data such as the notes in the EHR.

In conclusion, there are a myriad of ways a librarian can get involved with Big Data. In this class we have learned about quite a few of them. I remember when I attended the opening reception at NLM’s Biomedical Informatics Course at Woods Hole, Dr. Lindberg told us we were change agents. I hope some of the participants of this class become just as inspired.

Categories: Data Science

Big Data, Healthcare, and the Evolution of the Health Science Librarian

Mon, 2018-04-23 05:00

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Lisa Mastin, Medical Librarian, WellStar Atlanta Medical Center, Atlanta, GA

Data is part of life and the amount of data being created, captured, stored, and analyzed is expanding exponentially. In the healthcare sector, Big Data is rapidly changing the landscape. Health Science librarians should get involved with big data in healthcare, at least at a basic level, because if they do not, they risk losing the ability to engage with the user (i.e. researcher, clinician, patron), in a user-centered environment. I see health science librarians working in several areas of data science. At the very least, and possibly the most essential element, would be to acquire an understanding of the language used in data science.

Although I do not believe all librarians should become data scientists or even work with big data (several postings in this course expressed a similar opinion), I do believe that all health science librarians need to know the terminology. In an online discussion based on the National Institute of Nursing Research (NINR) module of Big Data, one reflection on Dr. Brennan’s video mentioned that she liked Dr. Brennan’s comment that “data science is a team sport,” and agreed that as librarians, we should be able to speak the language and “at least know who to turn to or ask.” This relates to the second area I feel that health science librarians should get involved with big data in healthcare – knowing who to go to with questions. In a reply to her reflection, another remarked, “librarians connect our users to articles, books, databases, and web resources;” so “what’s to stop us from connecting our users to experts on campus?” I agree that librarians can learn who the data science experts are at their institution and then pass that information along to their users. In doing this, the health science librarians are establishing contacts and forming relationships across their campus or institution, and creating connections is something else librarians are skilled at doing.

Training is also a skill that librarians excel at and is the next area where I see health science librarians becoming involved with big data. As Jeff Durham noted in a reflection on medical research, librarians, “have advanced skills in information and pedagogy,” so are well suited to train researchers. Other class members shared this idea, and I believe that most librarians feel confident when it comes to training/teaching. Health science librarians could, for example, train researchers on how to use data science-related technology tools or on how to find specific information in their electronic health records (EHR). If health science librarians gain access to the EHR at their institutions, this opens the door to other areas in which they could assist with big data. I see librarians creating metadata and/or controlled vocabularies for the natural language portion of patient notes entered into the EHR by clinicians. We discussed this in the module five online discussion session and several participants expressed interest in assisting in these areas, as well as working with an EHR in other capacities (i.e. adding links to the library website or related databases, adding information for physicians, etc.).

In addition to the areas I have mentioned, I feel that data visualization, population health, and data management would also be areas in which health science librarians could work with big data. Traditional librarian skills, such as information searching, research methods, database management, archival work, and digital preservation combined with some newer skill sets (data literacy, informatics, visual analytics), will allow health science librarians to compete for these roles. Where and how health science librarians decide to get involved with big data in healthcare will certainly vary by individual librarian, by what is most important is that they do become involved with it. I reviewed an article about an ongoing big data research project on cardiovascular care in China, and in this article, there was no mention of librarians assisting with the project. One of the course instructors made the wise comment that she wondered if there are people working on the project performing research data management functions. if there was someone performing these roles, they weren’t trained as librarians. I now think that there are probably many research projects where people are doing the data science work that we have discussed in this course, but librarians are not doing it. Health Science librarians can bring their unique skills to big data research projects if they possess the skills and researchers know librarians are capable and can provide big data support.

Categories: Data Science

Data, Data Everywhere and Not a Drop to Drink

Fri, 2018-04-20 05:00

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Jeff Durham, Medical Librarian, Desert Regional Medical Center, Palm Springs, CA

We swim in a sea of information; more often than not we are drowning in it. When a person is presented with a smorgasbord of data, how do we determine what we should eat? This is the current situation with regards to big data and healthcare. What data should be utilized and how. It is in this data-centric meal that the data-savvy health science librarian should be most at home: as critic, guide, and chef.

As health science librarians, we have a responsibility to not only provide the communities that we serve with access to up-to-date and accurate information, but also must be available to enable and facilitate the informational needs of researchers in our communities. With the tremendous amount of big data that is generated on a daily basis, health science librarians have a duty to become involved and assist all of their patrons, both lay and professional, to access, extract, and manage the data (both big and small) that they need.

There are barriers to making a librarian into a data-savvy librarian who can tackle big data problems with ease. One barrier is that many graduate schools in library and information science have not been as keen to teach data science in a general education format, preferring to see it more as a sub-specialty. This occurs ironically enough in iSchools as well. While there is a growing trend to change this educational oversight, it is not the dominant paradigm yet. Another barrier is that of opportunity. All too often, the librarian simply does not have the time or their employer does not provide the means (e.g. time off, reimbursement) for the librarian to refresh their skill set. Until library managers and directors see the value of continuing education of the librarians on their staff on how to use data science and work with big data, the health sciences librarian will continue to fall behind.

There are also opportunities to be found. In hospitals and health science libraries, with residents and medical students, there are lots of in-roads for librarians to make. Given the exponential growth in big data that biomedical devices and the prevalence of smart devices which are constantly generating both passive and active data there is a lot of big data to utilize. The data that is being produced has the potential to be used in research projects for students, residents, nurses, and doctors on staff. There is a significant gap between the abilities of these medical professionals and that of data science. The role of the data-savvy librarian is to be a bridge between these gaps. The data-savvy librarian is able to assist their patrons in identify the datasets that they need as well as demonstrating how to wrangle, clean and visualize their data. By doing this, the librarian provides an essential role in the medical field. It is through the management of big data and assisting the researcher with working with the data and discerning patterns and trends that the librarian enables the student, nurse, or clinician to make evidence-based decisions on the data. By doing so, the librarian assists not only the informational needs of the researchers, but also has a very real impact on patient care.

Categories: Data Science

Reflections on Big Data in Healthcare: Exploring Emerging Roles

Mon, 2018-04-16 17:55

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Kathleen Carlson, Education Librarian, College of Medicine Phoenix, University of Arizona, Phoenix, AZ

It is essential for the future of medical librarians to get involved in Big Data. Much of our future work will be coming from big data research projects, especially librarians that work in hospitals and health care systems. Since librarians were early adopters of technology, we were able to move from print indexes to searching indexes on CD-ROMs that were eventually moved to the Web. Moving from the card catalogue to integrated automated library systems, librarians understand how important it is to move forward with Big Data. Many of the older, experienced librarians may not have the expertise or training in the fields of math, computational skills, statistics and domain expertise but we know that our profession should be part of our institutions Big Data team and at least have a seat at the table.

I know that being an Assistant Professor of Practice in the Department of Biomedical Informatics (BMI) at my academic institution, has allowed me to understand and speak the language of Big Data. Clinicians will come to me for resources and journal articles and I have learned a lot by attending monthly journal club meetings on different subjects of Biomedical Informatics and Big Data. BMI fellows, Chief Medical Information Officers (CMIO,) Chief Nursing Officers (CNO) of area hospitals, and BMI faculty attend the sessions. Here I have an opportunity to be seen and be heard and ask questions when they arise as a non-clinician. We have covered the following topics of Big Data and Informatics in the past three years:

  • Cybersecurity
  • Data Standards
  • Health Literacy
  • Electronic Health Record/Electronic Patient Record
  • Process Oriented Health Information Systems
  • Clinical Decision Support Systems
  • Graphic Display and Visualization
  • Health information Exchange
  • Cloud Computing Services
  • Substitutable Medical Applications and Reusable Technologies (SMART)
  • Fast Health Interoperability Resources (FHIR)

I also attend monthly Clinical Informatics Grand Rounds. The speakers vary from clinicians to researchers, MBA, Pharmacy and Public Health faculty.

So, for the past three years I have had a seat at the table and have given our library visibility within Biomedical Informatics and Big Data. I also believe that a medical librarian at any institution should find a champion or champions that will assist him/her in getting a seat at the table. And when that is accomplished, a hospital librarian should get permission to embed at least one vetted  link that is appropriate to a patient’s electronic record with,  National Institute on Aging, or another consumer health oriented resource. This would relieve the burden on clinicians in finding the best resource for patient care.

Big Data can be organized, appraised, secured, preserved with a librarian’s help and can assist researchers and clinicians in patient care and help find areas that may need improvement. Creating an online resource guide with Big Data tools and resources can be a first step into marketing the librarian and library. The NNLM PSR had recently recruited a data and technology services coordinator. She asked librarians if they collected any data for their institution. Unfortunately, we are considered a satellite campus of a large Research One University. I think there are areas at my institution where data is collected but could be used more effectively. I know within the Scholarly Project, a four-year mandatory thesis and poster at our institution, many of our students use Big Data from area hospitals or the state’s data archives to have foundational information in their presentations and theses. They are assisted by their clinical mentors.

I also like one of my fellow course student’s discussion post about teaching himself ‘R’ so he is able to teach classes to the data scientists on his campus. Finding resources for Big Data programming language and free software for statistical computing and graphics software like ‘R’ and can help the librarian be an informational resource for Big Data collection. This instruction example is one-way librarians will have to get out of your comfort zone and put themselves out there for Big Data. We have access to SPSS and STATA in our library commons. I took three classes on RedCap to help me understand Big Data and how to collect it safely and securely. REDCap is a secure web application for building and managing online surveys and databases and collecting data.

The librarian can be the go-to resource for students and researchers and help them search the archives of stored Big Data sets. I do not believe that our small campus has the capacity to store Big Data and it is not something that the larger academic institution is willing to duplicate. I do believe that as a librarian being visible and attending committee meetings, journal clubs, clinical informatic rounds and actually showing an interest in learning about Big Data gives a librarian the knowledge and vocabulary to understand and share with her constituents. The librarian can also familiarize himself/herself with websites that assist in Big Data knowledge similar to the Institute for Health Metrics and Evaluation which I learned about in the course discussions.

Categories: Data Science

DataFlash: Data Indexers

Mon, 2018-04-02 13:50

The Institute for Health Metrics and Evaluation (IHME) is “an independent population health research center at UW Medicine, part of the University of Washington, that provides rigorous and comparable measurement of the world’s most important health problems and evaluates the strategies used to address them.” Their mission is to improve the health of the world’s populations by providing the best information on population health, and to do so, IHME enlists the expertise of countless individuals, including researchers, data analysts, data scientists, and thirteen data indexers. What is a data indexer? Lyla Medeiros, a data indexer at IHME, shares more about her essential role below…

What is a data indexer? And how long have you been in the role?

Data indexers are part of a team responsible for providing librarian services to IHME. Data indexers not only catalog data for inclusion in the Global Health Data Exchange (GHDx), they also organize and maintain data files, provide reference services to IHME researchers, and search for and acquire new data sources. Data indexers are also responsible for creating documentation on cataloging practices, implementing improvements to process and workflows, reporting and testing technical issues that pop up in the GHDx for the Drupal development team, and managing controlled vocabularies and taxonomies, which includes researching and adding terms. I’ve been working as a data indexer for four years and three months.

What is your education/occupational background?

I earned a BA in Dance Studies and Art History at the State University of New York, Empire State College and a Masters of Library Science at Indiana University, Bloomington. Before becoming a librarian, I trained to become a classical ballet dancer and teacher. I’ve taught ballet in New York, New Mexico and here in Washington.

Who do you work with at IHME?

Outside of the data services team, I work with public health researchers, data analysts, Drupal developers, and student assistants.

IHME US Map Data Visualization

IHME US Map Data Visualization

What types of data do you work with?

The data that IHME uses to create global health estimates comes in data file formats like .dta, .dbf, .sav, and Excel tables, Word documents, text files, .pdf documents and Access databases. When necessary, we digitize books and sometimes even microfiche. Right now, I primarily catalog health and demographic survey datasets and their related geospatial data. In the past, I’ve also worked on cataloging health statistics reports, epidemiological surveillance, and serial publications. Some other types of data we collect and catalog include vital registration, hospital discharges, censuses, disease registries and government health budgets.

What do you enjoy most about your job?

I most enjoy the variety of work. For example, today I did research on stroke in order to create new keywords and planned out how to retroactively apply the new keywords to existing records, searched for and cataloged new survey data, contacted a survey provider about missing variables in a data file, and worked on a presentation I’ll be giving to on our keyword taxonomy.

What advice would you give other librarians interested in working with data/in the field of data librarianship? 

I am forever thankful for the classes I took in graduate school that focused on representation and organization, metadata and semantics, indexing, creating ontologies in RDF/RDFs (Resource Description Framework/Resource Description Framework Schema) and cataloging in XML. Those classes provided me with a solid foundation for the type of work I do as a data indexer.

I would like to sincerely thank Lyla for providing us with insight into a librarian role that is quite unique, and quite essential. If you would like to learn more about IHME, the GHDx, and many of their ground breaking projects and visualizations, please visit

Categories: Data Science