National Network of Libraries of Medicine
English Arabic Chinese (Simplified) French Hindi Japanese Korean Persian Portuguese Russian Spanish

Data Science

Data Flash: “Storage Wars”

PNR Data Science - Wed, 2018-07-11 05:00

You may have seen the feature on the front page of our website, “Where in the World are the PNR Coordinators?” But, we don’t always report back on our travels!  So, here is a quick view of a conference I attended on behalf of the NNLM-PNR, that took place in Bozeman, MT last month, called “Open Repositories 2018”.  What is an open repository?  I like this definition from the “Repositories Support Project”:

“A digital repository is a mechanism for managing and storing digital content. Repositories can be subject or institutional in their focus. Putting content into an institutional repository enables staff and institutions to manage and preserve it, and therefore derive maximum value from it… Repositories use open standards to ensure that the content they contain is accessible in that it can be searched and retrieved for later use.”

I don’t work with repositories directly, so this conference was basically like drinking water from a fire hose.  The attendees were a mix of librarians/library staff and people from the IT side of running repositories, meaning that my comprehension of a given session could range from about 5% (for the very techie ones) to 100%.  And that was fine—I got a great introduction to the issues involved in starting and running repositories, and learned about some new trends, some areas of conflict and some growing pains (hence the title of this post).  For example, take a look at this presentation by Peter Sefton.  I pretty much understand the whole section above the picture of the boat, and then an average of about 65% of what’s below it; that feels worth it to me!   It was an international conference, so the perspective on how repositories are handled was global.   I would never otherwise have heard of Australian Sefton’s work, or been able to attend a session on the Digital Repository of Ireland.   I even got to spend a full day attending two workshops on Wikidata and Wikipedia editing (did I mention that the NNLM’s next Online Wikipedia Edit-a-Thon is November 7 this year?).

And, one great thing about open conferences and all things open is that you can often gather the content for yourself after the conference even if you didn’t attend it.  Here are some options if you want more information about what happened at this conference:

YouTube stream of everything held in the main session space (including the Digital Repository of Ireland presentation)

Notes from sessions

The program

— Social media: Twitter= @OR2018MT, Instagram= @openrepositories18

I leave you with three photos from the experience.  One is of me with my poster highlighting three of the National Library of Medicine’s eight data sharing repositories: ClinicalTrials.gov, PubChem and GenBank.  And the other two are from my visit to the Museum of the Rockies, which features the most amazing dinosaur exhibit I’ve ever seen, and a thing I love—a historic house which was moved to the museum site, furnished appropriately to the period in which it was built, and staffed by costumed and knowledgeable living history interpreters.

Categories: Data Science

Data Management for Librarians CE workshop

SEA Data Science - Thu, 2018-07-05 08:37

The University of Minnesota Health Sciences Libraries is hosting a 4-hour Data Management for Librarians CE workshop in Minneapolis, MN on August 6th. Registration for the workshop is free, and there are a select number of travel stipends available for up to $1,000.  The Workshop will introduce participants to key elements of research data management in the health sciences, including best practices for documentation, metadata, backup, storage, and preservation. Participants in the CE course may also partake in an online data management skills community of practice, which will meet quarterly to take a deeper dive into data management topics. The course will also provide 4 MLA CE credits. More information about the training, stipend requirements, and registration can be found on the GMR’s Blog. Any questions related to the Workshop should be directed to Lisa McGuire at: lmcguire@umn.edu

Categories: Data Science

DataFlash: Staying Informed

PNR Data Science - Wed, 2018-06-27 17:38

Network Big data and research data management are evolving quickly and it can be challenging to keep up with developments in the field. Social media is a great way to keep track and to ask questions of colleagues, researchers, and vendors. Below are several links worth checking out…

CANLIB-DATA is a listserv for issues related to research data in Canadian libraries, with more than 350 subscribers.

DataCure “is a Google group of librarians and information professionals whose members have significant roles or responsibilities in providing services in managing or curating research data. Datacure exists to provide a safe space for data professionals to talk frankly about their ideas, projects, successes, and struggles with their work.”1

Datalibs distribution list is intended to serve as both a bulletin board for news, upcoming events, and continuing education/job opportunities as well as a forum that librarians can use to post questions or to initiate and engage in discussions. Join via the Journal of eScience Librarianship website.

IASST-L  The International Association for Social Science Information Services and Technology (IASSIST) is an international organization of professionals working with information technology and data services to support research and teaching in the social sciences. Join IAssist ($50 USD annually) to access their organization’s email discussion list IASST-L.

MLA Data-SIG is the Medical Library Association’s data related special interest group. Membership in the MLA is required to access the SIG list serv.

@NNLM_RD3 is the NNLM RD3: Resources for Data-Driven Discovery website’s Twitter feed. When tweeting, use the #datalibs hashtag to reach out to other data librarians.

RDAP or the Research Data Access & Preservation Summit is relevant to the interests of data managers, data curators, librarians working with research data, and researchers and data scientists. RDAP is currently in transition and has moved its listserv to a new server. RDAP’s new e-mail address may be the best place to inquire about further developments.

RESEARCH-DATAMAN is an email discussion list for United Kingdom education and research communities.

The data science departments on your own campus may also host listservs, Twitter sites, Facebook pages, or blogs. The University of Washington’s eScience Institute is just one example of the data related centers available near the PNR’s home base. If you know of additional data related listservs, Google Groups, or Twitter sites, share them with your colleagues by entering them in the comments section below.

 

1  Barbrow S, Brush D and Goldman J. (2017). Research data management and services: Resources for novice data librarians; ACRL College and Research Libraries News, 78(5)

Categories: Data Science

GMR Funds Innovative Pilot Program to Teach Graduate Students Research Data Management

GMR Data Science - Tue, 2018-06-26 16:12

 

The GMR office is excited to announce that Tina Griffin at the University of Illinois at Chicago has been granted a Research Data Award to develop the Research Data Management Best Practice Implementation Program for Graduate Students in STEM and Health Sciences!

Background:

Today, data management practices by students are largely learned by conforming to the laboratory culture and adopting habits from the environment in which they work. There is no known national mandatory data management training for students. The recent NLM strategic plan (PDF) recognizes the importance of the role of libraries in advancing open science and data management, and many academic libraries are heeding the call by providing research data management education services.

Project Description

This project will pilot a flipped classroom model to present students with appropriate research data management practices in an eight-week intensive program. In this program, the students are expected to engage with the instructional content outside the classroom, while using the in-person classroom time to engage in activities that demonstrate competency and understanding of the content. The 8-week program will cover the following topics:

  1. Introduction to Data management principles;
  2. Deep Dive – discipline standards, DMP draft;
  3. Project map, project narrative starts;
  4. Folder structure develops;
  5. File naming, table of contents, indexing develop;
  6. Templates develop;
  7. DMP finalized, project narrative finalized; and
  8. Ongoing practice, personal policy developed

The classroom time will be used by the students to systematically develop and holistically integrate these practices in to their research projects. This pilot project is unique in that it addresses both education about data management practices and the integration of best practices into the research workflow in a personalized manner.

Outcomes

The outcome of this pilot may introduce a new method to serve more students in a more effective manner with better long-term adoption of data management best practices. It also begins a longitudinal study to determine how these practices may contribute to successful dissertation/thesis completion and/or how they may prepare students for the workforce.

Categories: Data Science

World Health Organization Releases ICD-11

SCR Data Science - Tue, 2018-06-26 08:42
Medical Photo

“Picture.” by RawPixels via Unsplash, March 18, 2018, CCO.

The World Health Organization has released the newest version of the International Classification of Diseases, ICD-11.  The ICD tracks health trends and statistics globally.  The nearly 55,000 unique codes identify injuries, diseases, symptoms, and causes of death.  These codes are the common language that health care professionals use to share information worldwide.

This new version of ICD has been in progress for several years and involved a large team of contributors.  Due to the scope of the project, it will not start being used until 2022.  This will allow time for users to familiarize themselves with the new product and prepare for implementation.

One new feature that is being touted as user friendly is a fully electronic version of the product which is a first for ICD.  There are also new chapters that include traditional medicine and sexual health.  The sexual health chapter is most notable for reclassifying transgender so that is no longer a mental health condition.  Another well publicized addition to ICD-11 is gaming disorder is now listed as an addictive disorder.

WHO’s Assistant Director-General for Health Metrics and Measurement, Dr Lubna Alansari, says: “ICD is a cornerstone of health information and ICD-11 will deliver an up-to-date view of the patterns of disease.”

Like NNLM SCR on Facebook and follow us on Twitter.

Categories: Data Science

New Research Looks at Long-Term Impact of Tonsillectomies

SCR Data Science - Tue, 2018-06-19 10:20

“Tonsillitis.” via MedlinePlus.gov, April 11, 2017, Public Domain.

When I was in grade school, it seemed as if nearly every kid would miss a week of school to have their tonsils removed. They would return to school bragging about their recovery spent eating ice cream, drinking milkshakes, and watching cartons.  I can almost acutely recall being jealous of these classmates.  After reading new research that evaluates the long-term health risks of tonsillectomies, I realized maybe I shouldn’t have been quite so jealous!

Tonsils are located at the back of the throat. These are knobs of tissue with one located on either side.  Tonsils are part of the lymphatic system which works to clear infections and keep the balance between body fluids.  Specifically, the tonsils, in concert with the adenoids, work by preventing germs from coming in through the mouse and nose.

A tonsillectomy is a procedure to remove the tonsils. This is typically recommended for those that suffer from recurrent infections of the tonsils or when the tonsils are enlarged enough that they obstruct breathing.  For adults, the tonsils are occasionally removed when there is concern for a tumor.

Over half a million tonsillectomies are performed annually in the United States but little research has been done to determine the long-term health risks associated with this procedure. A new study released by the University of Melbourne is the first to look at potential risks.  Their results suggest that individuals who undergo a tonsillectomy are at 3x the risk of their counterparts for diseases of the upper respiratory tract such as asthma, influenza, pneumonia and chronic obstructive pulmonary disease – COPD.

Read the entire study findings to learn more.

Like NNLM SCR on Facebook and follow us on Twitter.

Categories: Data Science

June PNR Rendezvous webinar next week

PNR Data Science - Wed, 2018-06-13 07:41

The next PNR Rendezvous monthly webinar is coming up.

Session title: Unlocking the Potential of De-identified Clinical Datasets

Presenter: Bas de Veer, Bio-Medical Informatics Services Manager for UW Medicine IT Services

When: Wednesday, June 20 starting at 1:00pm PT, Noon Alaska Time, 2:00pm MT

Healthcare systems generate a ton of data on a daily basis. The primary purpose of this data is billing and clinical decision making. But great secondary use of this data is research. This webinar will discuss the potential uses, best practices and common hurdles of de-identified clinical datasets.

Registration is encouraged but not required. However, attending the live session will allow for questions. The session will be recorded and posted on the PNR Rendezvous web page a few days after the live session.

Medical Library Association CE credit is available for both the live and the recorded session.

More information about how to join the session is available on the PNR Rendezvous webpage.

Categories: Data Science

Moodle Class Announcement: Big Data in Healthcare: Exploring Emerging Roles – July 9 – August 31, 2018

SEA Data Science - Tue, 2018-06-12 15:45

The National Network of Librarians of Medicine (NNLM) invites you to participate in Big Data in Healthcare: Exploring Emerging Roles. This course will be primarily held via the Moodle platform with optional WebEx discussions. This course is designed to help health sciences librarians understand the issues of big data in clinical outcomes and what roles health sciences librarians can take on in this service area.

DatesJuly 9 – August 31, 2018

Register: To register for this course, please visit the class details page.

The class size for this course is limited to 40 students. We will begin a waitlist if there are more interested in participating.

Course instructors for the winter session are Ann Glusker, Pacific Northwest RegionDerek Johnson, Greater Midwest RegionAlicia Lillich, MidContinental RegionAnn Madhavan, Pacific Northwest RegionAimee Gogan, Southeastern/Atlantic Region, and Elaina Vitale, Mid-Atlantic Region.

Please contact Aimee Gogan with questions.

Description: Class Overview

Big Data in Healthcare: Exploring Emerging Roles

The Big Data in Healthcare: Exploring Emerging Roles course will help health sciences librarians better understand the issues of big data in clinical outcomes and what roles health sciences librarians can take on in this service area. Course content comes from information shared by the presenters at the March 7, 2016 NNLM Using Data to Improve Clinical Patient Outcomes Forum, top selections from the NNLM MCR Data Curation/Management Journal Club and NNLM PSR Data Curation/Management Journal Club’s articles, NINR’s Nursing Research Boot Camp, recommended readings from previous cohorts, and Big Data University’s Big Data Fundamentals online course.

Participants will have the opportunity to share what they learned with the instructor from each section of the course content either through WebEx discussions or Moodle Discussions within each Module. These submissions can be used to help support the student’s views expressed in the final essay assignment.

Objectives: Students who successfully complete the course will:

  • Explain the role big data plays in clinical patient outcomes.
  • Explain current/potential roles in which librarians are supporting big data initiatives
  • Illustrate the fundamentals of big data from a systems perspective
  • Articulate their views/options on the role health sciences sector librarians is in supporting big data initiatives

NOTE: Participants will articulate their views on why health sciences librarians should or should not become involved in supporting big data initiatives by sharing a 500-800 word essay. Students are encouraged to be brave and bold in their views so as to elicit discussions about the roles librarians should play in this emerging field. Participants are encouraged to allow their views to be published on a NNLM online blog/newsletter as part of a dialog with the wider health sciences librarian community engaging in this topic. Your course instructors will reach out to you following the completion of the course.

On top of information gained, being a part of the big data in clinical care dialog, and earning 9 continuing education credits from the Medical Library Association, students may earn an IBM Open Badge program from the Big Data University.

This is a semi-self-paced course (“semi” meaning there are completion deadlines).

Course Expectations: To complete this course for nine hours of MLA contact hours, participants are expected to:

  • Spend 1-2 hours completed the work within each module.
  • Commit to complete all activities and articulate your views within each module.
  • Complete course requirements by the deadline established in each module.
  • Coordinate with a course instructor to publish your observations/final assignments on a NNLM blog/newsletter
  • Provide course feedback on the Online Course Evaluation Form

Grading: Grades for this course is simply a pass/fail grading system. When your submission meets the assignment’s expectations, you will receive full credit for the contact hours for that Module. For submissions that are unclear or incomplete, you may be requested for more information until your instructor approves.

  • For discussion posts, your activity will be marked as complete after you’ve submitted a discussion AND your instructor assigns a point to mark as complete
  • If you participate in WebEx Journal Club Discussions (when available), your instructor will assign points in the Discussions for that module.
  • Students have the option to accept fewer contact hours. However, you will need to inform your course instructors ahead of time.

This is a Medical Library Association approved course that will earn students 9 contact hours.

 

The National Network of Librarians of Medicine (NNLM) invites you to participate in Big Data in Healthcare: Exploring Emerging Roles. This course will be primarily held via the Moodle platform with optional WebEx discussions. This course is designed to help health sciences librarians understand the issues of big data in clinical outcomes and what roles health sciences librarians can take on in this service area.

DatesJuly 9 – August 31, 2018

Register: To register for this course, please visit the class details page.

The class size for this course is limited to 40 students. We will begin a waitlist if there are more interested in participating.

Course instructors for the winter session are Ann Glusker, Pacific Northwest RegionDerek Johnson, Greater Midwest RegionAlicia Lillich, MidContinental RegionAnn Madhavan, Pacific Northwest RegionAimee Gogan, Southeastern/Atlantic Region, and Elaina Vitale, Mid-Atlantic Region.

Please contact Aimee Gogan with questions.

Description: Class Overview

Big Data in Healthcare: Exploring Emerging Roles

The Big Data in Healthcare: Exploring Emerging Roles course will help health sciences librarians better understand the issues of big data in clinical outcomes and what roles health sciences librarians can take on in this service area. Course content comes from information shared by the presenters at the March 7, 2016 NNLM Using Data to Improve Clinical Patient Outcomes Forum, top selections from the NNLM MCR Data Curation/Management Journal Club and NNLM PSR Data Curation/Management Journal Club’s articles, NINR’s Nursing Research Boot Camp, recommended readings from previous cohorts, and Big Data University’s Big Data Fundamentals online course.

Participants will have the opportunity to share what they learned with the instructor from each section of the course content either through WebEx discussions or Moodle Discussions within each Module. These submissions can be used to help support the student’s views expressed in the final essay assignment.

Objectives: Students who successfully complete the course will:

  • Explain the role big data plays in clinical patient outcomes.
  • Explain current/potential roles in which librarians are supporting big data initiatives
  • Illustrate the fundamentals of big data from a systems perspective
  • Articulate their views/options on the role health sciences sector librarians is in supporting big data initiatives

NOTE: Participants will articulate their views on why health sciences librarians should or should not become involved in supporting big data initiatives by sharing a 500-800 word essay. Students are encouraged to be brave and bold in their views so as to elicit discussions about the roles librarians should play in this emerging field. Participants are encouraged to allow their views to be published on a NNLM online blog/newsletter as part of a dialog with the wider health sciences librarian community engaging in this topic. Your course instructors will reach out to you following the completion of the course.

On top of information gained, being a part of the big data in clinical care dialog, and earning 9 continuing education credits from the Medical Library Association, students may earn an IBM Open Badge program from the Big Data University.

This is a semi-self-paced course (“semi” meaning there are completion deadlines).

Course Expectations: To complete this course for nine hours of MLA contact hours, participants are expected to:

  • Spend 1-2 hours completed the work within each module.
  • Commit to complete all activities and articulate your views within each module.
  • Complete course requirements by the deadline established in each module.
  • Coordinate with a course instructor to publish your observations/final assignments on a NNLM blog/newsletter
  • Provide course feedback on the Online Course Evaluation Form

Grading: Grades for this course is simply a pass/fail grading system. When your submission meets the assignment’s expectations, you will receive full credit for the contact hours for that Module. For submissions that are unclear or incomplete, you may be requested for more information until your instructor approves.

  • For discussion posts, your activity will be marked as complete after you’ve submitted a discussion AND your instructor assigns a point to mark as complete
  • If you participate in WebEx Journal Club Discussions (when available), your instructor will assign points in the Discussions for that module.
  • Students have the option to accept fewer contact hours. However, you will need to inform your course instructors ahead of time.

This is a Medical Library Association approved course that will earn students 9 contact hours.

 

 

Categories: Data Science

NIH Strategic Plan for Data Science

MCR Data Science - Mon, 2018-06-04 18:41

The National Institutes of Health (NIH) today released its first ever Strategic Plan for Data Science (PDF). The plan describes NIH’s overarching goals, strategic objectives, and implementation tactics for promoting the modernization of the NIH-funded biomedical data science ecosystem.

Wondering how libraries fit into the plan? NIH will partner with institutions to engage librarians and information specialists in finding new paths in areas such as library science that have the potential to enrich the data-science ecosystem for biomedical research. /da

Categories: Data Science

NIH Releases Inaugural Strategic Plan for Data Science!

PSR Data Science - Mon, 2018-06-04 18:04

Storing, managing, standardizing and publishing the vast amounts of data produced by biomedical research is a critical mission for the National Institutes of Health. In support of this effort, NIH has just released its first Strategic Plan for Data Science that provides a roadmap for modernizing the NIH-funded biomedical data science ecosystem. Over the course of the next year, NIH will begin implementing its strategy, with some elements of the plan already underway. NIH will continue to seek community input during the implementation phase.

Accessible, well-organized, secure, and efficiently operated data resources are critical to modern scientific inquiry. By maximizing the value of data generated through NIH-funded efforts, the pace of biomedical discoveries and medical breakthroughs for better health outcomes can be substantially accelerated. To keep pace with rapid changes in biomedical data science, NIH will work to address the:

  • findability, interconnectivity, and interoperability of NIH-funded biomedical data sets and resources
  • integration of existing data management tools and development of new ones
  • universalizing innovative algorithms and tools created by academic scientists into enterprise-ready resources that meet industry standards of ease of use and efficiency of operation
  • growing costs of data management

To advance NIH data science across the extramural and intramural research communities, the agency will hire a Chief Data Strategist. This management function will guide the development and implementation of NIH’s data science activities and provide leadership within the broader biomedical research data ecosystem. Jon R. Lorsch, Ph.D., director of the National Institute of General Medical Sciences, is currently available to comment on this strategic plan.

Categories: Data Science

UMN Now Accepting Applications for Data Management Training Session

GMR Data Science - Wed, 2018-05-30 16:50

 

Data Management for Librarians CE Course

Monday, August 6, 2018

Health science librarians from states represented by the Greater Midwest Region (GMR) are invited to participate in a data management for health sciences librarians CE course, hosted by the University of Minnesota Health Sciences Libraries in Minneapolis, MN.

The overall objective of this session is to introduce librarians to research data management and allow them to develop practical strategies for incorporating data into their existing roles.

Course Components

This 4-hour workshop will introduce participants to key elements of research data management in the health sciences, including best practices for documentation, metadata, backup, storage, and preservation. We will also explore advanced areas of research data management such as de-identification and intellectual property. The session will incorporate several activities to enable participants to apply best practices of data management when creating their own data management plans and critiquing existing data management plans (DMP). Beyond understanding the basics of research data management and applying those in the creation and assessment of DMPs, this session will also give participants an opportunity to consider the ways in which research data services can be incorporated into existing roles and responsibilities, including highlighting searching for research data for secondary analysis and integrating research data services into instruction and reference activities.

Data Management Skills Community of Practice (CoP)

Participants in the CE course may also participate in an online data management skills community of practice (CoP). The CoP will meet quarterly to take a deeper dive into a data management topic that could include federal funding compliance, data preservation & sharing, and open science. Topics are TBD and will be developed based on cohort needs.

CE Credits

Participants who complete the course will receive 4 MLA CE credits.

Instructor & CoP Facilitator:

Caitlin Bakker, MLIS: Caitlin Bakker is a health sciences librarian specializing in research support services, including data management, scholarly publishing, and citation tracking and analysis. She received her Masters in Library and Information Studies from McGill University in 2011 and is a Senior Member of the Academy of Health Information Professionals. Caitlin is interested in meta-research, and her projects have focused on publication models, systematic reviews, research ethics, and research impact.

Who can apply?

  • Applications are open to health science librarians in the Greater Midwest Region (Illinois, Indiana, Iowa, Kentucky, Michigan, Minnesota, North Dakota, Ohio, South Dakota, Wisconsin)
  • Twelve librarians from the GMR will be awarded a stipend to have their travel costs to/from Minneapolis reimbursed up to $1000. Applications for the stipend must include a personal statement, cv and letter of support from their supervisor (see Application Instructions below).
  • Enrollment is limited to 35 participants

What does it cost?

  • There is no charge for the CE course
  • Twelve participants from the GMR will receive a reimbursement up to $1000 for travel costs.
  • Individuals who are not selected to receive the reimbursement but still wish to take the course are responsible for their own travel costs

How can I get there?

  • All stipend award attendees who elect to fly to Minneapolis-St. Paul International Airport must book their air travel on a U.S. air carrier per our grant award. MSP is served by all the major US carriers including American, Delta, JetBlue, Southwest, and United.

Where can I stay?

  • There is a block of 12 rooms being held at the Graduate Hotel, which is conveniently located on the Minneapolis East Bank campus. These 12 rooms are reservable at the discounted event rate ($160/night) on a first-come, first-served basis. Other hotels in walking-distance to campus include the Courtyard by Marriott, DoubleTree by Hilton, and the Hampton Inn and Suites. Each of these hotels is connected to campus via the Green Line light rail system. The closest light rail station to Bruininks Hall is the East Bank station.

Session Agenda:

  • Lunch and networking 12-1:00pm
  • CE course 1-5:00pm
  • Complete session evaluations 5:00-5:15pm

Important Dates

 

  • Stipend application deadline: Friday, June 22, 2018
  • Non-stipend application deadline: open until filled
  • Notifications: Friday, June 29, 2018
  • Course Date: Monday, August 6, 2018

 

Application Details

  • Name and Contact Information
  • Current Role/Title
  • Place of Employment

If Applying for Travel Stipend, please include:

  • Personal statement (1-2 paragraphs) describing your individual goals, why the training is needed and how you will apply the training in practice
  • CV
  • Letter of Support from your supervisor describing why you should attend and how your participation in the workshop and the quarterly online data management skills CoP will impact the organization moving forward

Application Instructions

Please fill out the online Application Form. If applying for the travel stipend, please upload a PDF of your current CV, your personal statement and your letter of support from your supervisor.

Questions?

Contact Lisa McGuire at: lmcguire@umn.edu

This activity is supported by the National Library of Medicine (NLM), National Institutes of Health (NIH) under cooperative agreement number 1UG4LM012346. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Categories: Data Science

Data Flash: What is this GDPR thing I keep hearing about?

PNR Data Science - Fri, 2018-05-25 20:44

May 25 begins the era of the GDPR, or, General Data Protection Regulation, a new European program with strong enforcement provisions which sets data protection as a default rather than requiring users to opt-out of entities being allowed to use their data (to put it VERY simplistically).  Why should we in the U.S. pay any attention to something applying to European data?  Well….

–The coming tide of companies, governments, and others using and combining and potentially misusing our personal data is no longer a swell, it’s a tsunami (says Tom Wheeler of the Brookings Institute).  The time to act is now, and Europe’s action will have ripple effects.  So it’s a good thing to be aware of the GDPR because something like it will be in our lives eventually (even if not coming soon to a theater near you).

–Many large American companies are already global anyway, and they are having to respond due to their European presence.  Facebook and Apple are two examples.

–Even many smaller U.S. companies and organizations, though they don’t have to protect your data under the GDPR, are proactively notifying you that they are taking steps to do so (you’ve probably seen a lot of these notices in your inbox recently–The New York Times suggests you read them).

–Last but not least, it’s a fascinating new conceptualization of our entitlements as online beings!  The GDPR arguably “enshrines data protection as a fundamental human right“.  It moves the discussion about data and our privacy as individuals WAAAAAY forward and in new directions.

This article, from Vox, puts it well: “…Norms are shifting once more. Looking back, we can frame the development of digital behavior into three phases: First, there was a naiveté phase, where consumers didn’t really understand the technology and what it meant. Then there was the careless phase, where people saw data rights or privacy as either unimportant or an acceptable price of entry to all the good, free stuff. Now it is clear we are entering the demand phase, which sees the emergence of a more savvy, engaged, and alarmed digital consumer — and related movements to create and enforce consumer rights.”

Watch this space–and all of your online presences–for further developments!

 

Categories: Data Science

All of Us

SCR Data Science - Thu, 2018-05-17 11:28
Diverse group photo

“All of Us Participants” via allofus.nih.gov, April 2018, Public Domain.

Over the past couple of weeks, you may have hear about a new research program that the NIH has launched.  All of Us endeavors to enroll 1 million or more adults to participate in a research study.  Data from this study will be used to further advance the practice of precision medicine.

Precision medicine allows doctors to determine treatments that are likely more effective for individuals, taking into account their genetics.  Although this approach to medicine is not new, advances in research have significantly fast tracked the progress of it.

A set of core values is guiding the development and implementation of the All of Us Research Program:

  • Participation is open to all.
  • Participants reflect the rich diversity of the U.S.
  • Participants are partners.
  • Participants have access to their information.
  • Data will be accessed broadly for research purposes.
  • Security and privacy will be of highest importance.
  • The program will be a catalyst for positive change in research.

To learn more about All of Us, view this video or visit the website.

Like NNLM SCR on Facebook and follow us on Twitter.

Categories: Data Science

Big Data in Healthcare – Opportunities for Librarians

SEA Data Science - Wed, 2018-05-09 14:12

by Douglas J. Joubert, Informationist, NIH Library, Washington, DC

Over the last seven weeks, in the Big Data in Healthcare – Opportunities for Librarians, we learned about big data and data science within the context of five distinct disciplines. This essay will provide an overview of big data and data science within each of the five disciplines, with a focus on how librarians can support researchers working in these fields.

Although not focused exclusively on Big Data, a recent report has strongly advocated for an increased role for librarians in the field of data science (Burton, Lyon, Erdmann, & Tijerina, 2018). This report outlines a multi-faceted framework for understanding the internal (within the discipline) and external (within the broader science disciplines) drivers that are changing the way in which we think about data.

Data science is one those terms that can take on different meanings, based on a particular practice area. One of the more popular representations of data science is that of Drew Conway. Conway represents data science as the intersection between three primary domains [Figure 1]. It is not vital that librarians be experts in each of the three domains that comprise this Venn diagram, nor is it even possible. What is important, and serves as the primary thesis of this essay is that librarians be grounded in how researchers in each of these areas produce, organize, and analyze data.

The Data Science Venn Diagram

 

 

 

 

 

 

 

 

 

 

Figure 1: The Data Science Venn diagram[1].

This course introduced us to a number of different perspectives on the topic of big data. The first view was provided by a data informationist (Lisa Federer) who works for a large biomedical research center. She defined big data as having a number of distinct qualities. This first of these qualities is the amount of data being produced, commonly referred to as its volume (Federer, 2017). The second quality is the variety of the data, specifically, pulling data from many different sources, in many different formats (Federer, 2017). The third feature of big data is the rate in which the data is being produced, or its velocity (Federer, 2017). Last, is data veracity. This refers to how much trust we place in the source of the data and the data quality (Federer, 2017). Additional definitions were provided by two social scientists, a practicing clinician, and a nursing researcher.

The nursing perspective provided some additional insights that are worth exploring. First is the unique role that nurses play in the delivery of health care, and how this role influences big data research (Brennan, 2015). Second, Dr. Brennan emphasized that terms like the Big Data to Knowledge (BD2K), big data, and precision medicine mean different things to different people (Brennan, 2015). The role for nursing to play is making these terms meaningful to patients and their families. Last, she emphasized that these tools need to be understood from the nursing experience, which takes a more humanistic approach when compared to the traditional medical model of health care delivery. Nurses are focused on getting the goals of precision medicine into the “hands of the people” (Brennan, 2015). All of these different perspectives are needed to fully understand the role of big data and how big data is changing the way that we conduct research, deliver health care, and make informed decisions.

Using three elements from Martin’s User-Centered Data Management Framework for Librarians, I will advocate for the increased role of librarians in both data science and big data initiatives. These elements are: (1) Service, (2) Best Practices for Data Management, and (3) Literacy (Martin, 2016).

Libraries have a long and rich history of providing services to different user groups. Adding data services as a component to more traditional library services allows libraries to respond to an increased demand for specialized levels of support for data science. Potential roles for librarians could fall into the following categories (1) data extraction, (2) data wrangling, (3) data analysis, or (4) data visualization (Hamalainen, 2016). Some of these skills, like data extraction or data analysis, can be performed without much additional training. Data wrangling and data visualization are not out of reach for most librarians, if they get supplemental training. These four areas also require the least amount of overhead when compared with, for example, hosting a data repository.

Also, many data service questions are very similar to the types of reference questions that librarians have traditionally answered. For example:

  • Knowing where to find authoritative and curated datasets
  • Knowing the best methods for searching datasets
  • Knowing how to choose the best software solutions
  • Knowing about current metadata schemas for data

Each week in this class presented us with a different challenge for managing data, and innovative solutions for dealing with these challenges. We also learned that these challenges are being addressed by local and national initiatives. At the federal level, a 2013 report was released by the Office of Science and Technology that outlined a number of important policy principles (Holdren, 2013). Many of these principles align to the work of libraries, and present us with numerous opportunities. The first is helping researchers comply with changing grant requirements. Second is working with researchers in efforts to maximize transparency and accountability in terms of collecting and storing data. Last is connecting researchers with tools like the Open Science Framework to support data sharing and increasing reproducibility.

As someone who has spent a great deal of his professional life teaching library users, this topic resonates the most with me. Also, I feel that librarians make some of the best teachers. Teaching about data literacy, data analysis, and data management offers incredible potential for librarians. It has been my experience that starting small is the best entry point into teaching these topics, for example, working with a colleague to develop a data literacy class, or volunteering to serve as a teaching assistant or back-up for a more seasoned teacher. Teaching a class in R or Python are admirable goals. However, it might not be the best place to start, nor is it necessarily the right solution for your library. Finally, look for both formal and informal professional development opportunities. This MOOC (Big Data in Healthcare[2]) and Best Practices for Biomedical Research Data Management[3] are just two recent examples of librarian-led data management classes. However, Meet Up groups[4] and connections developed through Social Media are also wonderful way to learn and network.

References

Brennan, P. (2015). Big Data in Nursing. Bethesda: NINR Big Data Bootcamp.
Burton, M., Lyon, L., Erdmann, C., & Tijerina, B. (2018). Shifting to Data Savvy: The Future of Data Science in Libraries.
Federer, L. (2017). Data Science 101. NNLM Beyond the SEA Webinar Series.
Hamalainen, H. W. (2016). Geoscience Librarianship 101: Making Sense out of “GeoReference.” Baltimore.
Holdren, J. P. (2013). Increasing Access to the Results of Federally Funded Scientific Research. Retrieved from https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/ostp_public_access_me mo_2013.pdf
Martin, E. (2016). The Role of Librarians in Data Science: A Call to Action. Journal of eScience
Librarianship, e1092. http://doi.org/10.7191/jeslib.2015.1092

 

[1] http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
[2] https://nnlm.gov/moodle/enrol/index.php?id=703
[3] https://learn.canvas.net/courses/1854
[4] http://www.datacommunitydc.org/calendar/  or https://www.meetup.com/find/s

Categories: Data Science

A Unique Data Experience: Reflections on ESFCOM’s Inaugural Hackathon

PNR Data Science - Mon, 2018-05-07 05:00

Today’s blog is by Nancy Shin, Sewell Memorial Fund Librarian Fellow at Washington State University’s Elson S. Floyd College of Medicine. Welcome, Nancy!

The most extraordinary thing happened to Washington State University’s Elson S. Floyd College of Medicine (ESFCOM) the weekend of April 13 -15, 2018.  ESFCOM hosted its inaugural Hackathon, which was organized by the College Technology Incubator Officer, Andrew Richards.  It was well attended by people from all walks of life and subject expertise including students and healthcare providers.  So, the big question is what exactly is a hackathon and why all the hype?

A hackathon is a social event that is focused on building small, innovative, and new technology projects.  It brings together teams of people to work on a common project within an overarching theme; at the end of the event, teams formally present their projects for judging.  The hackathon can last from 4 hours to 1 week (sleep is optional) and can involve large cash purses as prizes.  Typically, projects are technological and can result in the development of a new app or feature on a website in response to a theme; in the case of ESFCOM’s Hackathon, the theme was “challenges in rural healthcare.” The common misconception about a hackathon is that it is an event that is strictly designed for computer programmers, engineers, and software developers – i.e. anyone who codes! However, other skills like research, design, project management, data management, and leadership are also important to the dynamic of an ideal hackathon team.

Arguably the first hackathon was hosted by OpenBSD in 1999, which is an operating system; ten developers came together to work on various software problems over the span of a week (Davis, 2016).  Since then, hackathons have more famously been hosted by various companies like Facebook and Yahoo in 2005 and 2006, respectively, in order to catalyze new innovations in a relatively “risk-free” and “creative” environment (Davis, 2016). In general, hackathons are organized by one of the following communities: open source software companies, tech companies, sponsored competitions, and community institutions (Davis, 2016).

PTme, the winning team

In the health field, a big community hackathon organizer is the National Institutes of Health (NIH) which often hosts hackathons with a bioinformatics theme.  Although the ESFCOM’s Hackathon was heavily inspired by MIT’s “Grand Hack & Hacking Medicine,” what makes the community hackathon at ESFCOM so different and unique from other health hackathons is that it encourages a diverse skillset to tackle healthcare problems.  For example, the winning team PTme was made up of a diverse skillset that included developers, medical students, business leaders, and engineering students while my own hackathon team was made up of a mathematician, bioengineer, computer engineer, designer, and health/data librarian.  Another unique feature of the ESFCOM’s Hackathon was the involvement of health librarians in the Spokane area in creating a “Research Station” that provided active research and data management for the participating teams.  The volunteer librarians were able to provide direct research support to assist with each team’s research and data management needs.  It is those two qualities, skill diversity and library support, which makes ESFCOM’s Hackathon one of a kind and a successful model for other health communities/organizations to follow for their future hackathons!

References:

Davis, R. C. (2016). Hackathons for libraries and librarians. Behavioral & Social Sciences Librarian, 35(2), 87-91. doi:10.1080/01639269.2016.1208561

Categories: Data Science

NNLM Research Data Management Webinar Series: Research Data Management Services: Beyond Analysis and Coding – June 14, 2:00 PM ET

SEA Data Science - Fri, 2018-05-04 10:10

Date/Time: Thursday, June 14, 2018, 2:00 PM ET/11:00 AM PT

Presenter: Margaret Henderson, Health Sciences Librarian, San Diego State University Library, San Diego, CA

Contact: For additional information or questions, please contact Tony Nguyen.

Abstract: There is more to RDM services than the technical skills necessary for data management. Soft skills and non-technical skills are very important when setting up RDM services, and continue to be important to the sustainability of services. Reference skills, relationship building, negotiation, listening, facilitating access to de-centralized resources, policy knowledge and assessment, are all important to the success of a service. Margaret Henderson will discuss these skills and show you how to start RDM services, even if you don’t feel confident about your statistical skills or knowledge of R.

Presenter Bio: Margaret Henderson was recently appointed Health Sciences Librarian at San Diego State University Library. She is liaison to the College of Health and Human Services and is also working with other Librarians at SDSU to set up RDM services. Previously, she spent three and a half years setting up RDM services at Virginia Commonwealth University Libraries. Margaret has been a biomedical librarian for over 30 years and is a Distinguished member of the Academy of Healthcare Information Professionals. She has presented and written on many library topics over the years, and wrote the book, Data Management: A Practical Guide for Librarians (2016, Rowman & Littlefield).

Registration: Please visit our class page to sign up!

Categories: Data Science

Reflections on Big Data in Healthcare: Exploring Emerging Roles

SEA Data Science - Thu, 2018-05-03 08:15

Written by: Paul Levett, Reference and Instructional Librarian, Himmelfarb Health Sciences Library, George Washington University, Washington, DC

Do you think health sciences librarians should get involved with big data in healthcare?

Of the four V’s: velocity, volume, variety, value, described in Cognitive Class (n.d.), it is value where medical librarians come into a discussion about Big Data because we add value to unstructured data, we bring order to chaos! Traditionally librarians have done this by creating metadata about learning objects, e.g. cataloging, finding aids, & infographics. However data mining, cleaning, analysis, and visualization requires computer programming, mathematics, and statistics skills not part of library school MLS programs.  

Burton & Lyon (2017) point to a technical skills gap that prevents librarians from contributing to big data initiatives. They promote the NCSU Data Science and Visualization Institute and Library Carpentry workshops to provide knowledge and opportunity to practice. But the NCSU Data Science and Visualization Institute lasts just one week, nowhere near enough time to develop and practice computer programming language, math, and statistics skills. Library Carpentry workshops typically are one-off instructional sessions that offer even less time, although I appreciate that the course material is available online at http://librarycarpentry.github.io/.  

If we look at the argument should librarians be doing data science, you can argue data science skills do touch on all the domains identified by Drummond et al (2015, Fig. 3 p.15) in the national librarian education needs assessment. Were I invited to suggest a program for developing the necessary skills to work in Big Data in Health Care Information Systems I would suggest a program like the MSc Data Analytics program in the University of Sheffield Department of Computer Science, that provides opportunities to study R and Python programming and statistical analysis and work on a real world project to apply those skills over a one year timeframe. Students on this program apply advanced Mathematics skills which is why the program requires an undergraduate degree in mathematics, economics, accounting, physics, chemistry or engineering.  

This suggests a need for the creation of a data scientist specialty role, but I am not convinced the Library actually is the best home for that role. Recently Simmons College (2017) surveyed 1117 graduates of their MLS program about core librarian professional skills and knowledge, of whom nobody rated data science as a core or a specialized skill, 14 mentioned statistics/working with data, only 6 mentioned data science/curation/management. As recently as last November in the IMLS (2017) meeting on positioning MLS programs for the 21st century there was lots of discussion about increasing the diversity of the profession but only one mention of data curation.

Tsakalos (2017) described Data wrangling as “the process of importing, cleaning, and transforming raw data into actionable information for analysis. It is a time-consuming process that is estimated to take about 60-80% of analysts’ time.” I feel the current push for librarians to develop data wrangling skills is perilously close to an admission from data analysts they want to offload what appears to be an onerous burden. This role would better fit someone working in University Departments of Computer Science, Mathematics, Statistics, or Epidemiology and Biostatistics.  It’s critical for librarians to manage expectations that the library is not a raw data processing warehouse but instead is a knowledge repository.

Where should librarians get involved?

There may be a role for librarians to pass on to Hospital IT departments information about updates and changes to important biomarkers where those need to be manually set as parameters by programmers building clinical decision support on top of EHR systems, however as this enters the realm of medico-legal responsibility the onus should be on EHR software developers to perform this necessary ongoing maintenance role.

Krumholz (2014) described how observational non-experimental studies generate data to support causal inferences and he points to comparative effectiveness studies as a potentially useful application of cluster analysis on large clinical data sets. A systematic review should be a pre-requisite for any health policy comparative effectiveness study, and this is where I as a librarian could best employ my literature search skills.

Librarians could be trained and certified to deliver RedCAP training, the data capture form design issues are similar to Microsoft Access, librarians would benefit by developing a deeper understanding of study design issues such as timing follow-up, patient data protection principles, and setting automated reminder parameters, while the enterprise would benefit from additional trainers to further spread the use of the RedCAP clinical trial data collection tool.

Sources

Burton, M., and Lyon, L. (2017). Data Science in Libraries. Research Data and Preservation (RDAP) Review. Bulletin of the Association for Information Science and Technology, 43(4) 33-35.

Cognitive Class/Fireside Analytics (n.d.). Big Data 101. Retrieved from  https://cognitiveclass.ai/courses/what-is-big-data/

Drummond, C., Clareson, T., Gemmill Arp, L., and Skinner, K. (2015). Libraries, Archives, and Museums (LAM) Education Needs Assessments: Bridging the Gaps. Retrieved from https://educopia.org/sites/educopia.org/files/publications/MtL_LAM_EducationNeedsAssessments_20151104_0.pdf

U.S. Institute of Museum and Library Services (IMLS) (2017). Positioning MLS programs for the 21st century. Retrieved from https://www.imls.gov/news-events/events/positioning-library-and-information-science-graduate-programs-21st-century

Krumholz, H. M. (2014). Big data and new knowledge in Medicine. Health Affairs, 33(7): 1163-1170

Simmons College (2017). Librarian professional skills and knowledge survey April 2017. Retrieved from http://slis.simmons.edu/blogs/unbound/2017/05/17/core-skills-lis/

Tsakalos, V. (2017). Data wrangling. Retrieved from https://www.r-bloggers.com/data-wrangling-cleansing-regular-expressions-33/

Categories: Data Science

Learning, Networking, and Sharing: Report on the April 10-11 NNLM Research Data Management Course Capstone Summit

PSR Data Science - Wed, 2018-05-02 19:43

by Andrea Lynch, MLIS
Scholarly Communications Librarian
Lee Graff Medical & Scientific Library
City of Hope
Duarte, CA

As part of the culmination of the NNLM Biomedical and Health Research Data Management for Librarians spring 2018 course (NNLM RDM course), a two-day Capstone Summit was held April 10-11, 2018, at the NIH campus in Bethesda, Md. Over 40 medical and health sciences librarians attended the impactful event, along with representatives from the National Library of Medicine (NLM) and various team members from the National Network of Libraries of Medicine (NNLM) regional network offices. It was a great opportunity to meet (in-person) fellow cohort participants as well as to get to know our NLM and NNLM colleagues while getting feedback on our Capstone Project plans.

group picture of attendees at the research data management capstone summit in bethesdaResearch Data Management Capstone Summit Attendees

The first day began with a meet & greet and a welcome from the NLM and NNLM representatives. We then had an opportunity to meet our mentors as well as fellow mentees supported by our assigned mentor. Then came the part of the event I was most anticipating, a presentation by NLM Director, Dr. Patricia Flatley Brennan. She highlighted the NLM Strategic Plan and addressed a myriad of questions. We then presented our Capstone Projects in small groups and received feedback from our peers and other course mentors. We enjoyed a delicious lunch, then went back to work participating in roundtable discussions on topics such as scalability and tools & technology supporting research data management programs and services. We were then fortunate enough to hear a presentation by a panel of experts at NLM and NIH, including Dr. Dina Demner-Fushman from NLM; Dr. Ben Busby of NCBI; and Lisa Federer of the NIH Library. We ended the day with an activity where we each wrote our best idea pertaining to research data management program success, and then collectively and anonymously rated each idea to come up with the handful of best ideas amongst the group.

The second day began with a group activity, with a goal of sharing our Capstone Project plans and getting high-level feedback. We then performed a group activity collecting aggregated feedback about the RDM course within small groups. Next up, Regina Raboin, Associate Director of the Lamar Soutter Library and Editor-in-Chief of the Journal of eScience Librarianship (JeSLIB), presented an overview and recent changes pertaining to the journal. She encouraged the course participants to submit manuscripts detailing their Capstone Projects once completed. The final presentation was by Kevin Read and Alisa Surkis of NYU with case study highlights from the academic medical libraries who participated in a NNLM Middle Atlantic Region Pilot Project on research data management. The concluding remarks from Amanda Wilson from NLM’s National Network Coordinating Office, as well as Ann Glusker & Ann Madhavan from NNLM Pacific Northwest Region did a great job of synthesizing the event’s outcomes and inspiring us to forge ahead on our Capstone Projects!

The Capstone Projects are due at the end of August. So, be on the lookout for those updates from NNLM and/or the respective course cohort participants. If you are going to the Medical Library Association annual meeting this month, please attend Sheila Green’s Lighting Talk detailing her experience participating in the NNLM RDM course, which is scheduled on the afternoon of May 22, 2018 (Sheila is a speaker during the Lighting Talk #5 session from 3:00 to 4:25 p.m.). Also, visit NNLM’s RD3 website for interesting research data management developments and RDM-related news, updates, and initiatives. The NNLM Research Data Management Working Group is very active and will update the site regularly. Lastly, keep your eyes peeled for the JeSLIB special issue on research data management and for a database of Capstone project reports on the NNLM RD3 site.

Categories: Data Science

Save the date! Mountain West Data Librarian Symposium August 13-14, 2018

MCR Data Science - Wed, 2018-05-02 13:23

The Mountain West Data Librarian Symposium is a low-cost professional development opportunity for librarians and research data specialists. August 13-14 at the University of Colorado Boulder for the inaugural Mountain West Data Librarian Symposium!

Registration will be under $30 and open in June. The symposium will consist of morning workshops and afternoon unconference sessions. They are currently seeking workshop proposals for the symposium. Proposals are due Tuesday, May 22.

Workshops may run 55, 90 or 120 minutes. They are seeking hands-on training sessions that focus on active attendee participation. Proposals should focus on how to fulfill the job duties of research data specialists. Potential topics include, but are not limited to:

  • teaching research data management
  • data sharing and re-use
  • data curation and preservation
  • data literacy
  • research reproducibility
  • communication and outreach

Proposals of up to 250 words should be submitted to: https://goo.gl/forms/9VRH07JjVNYf9f8o2. Please view the full call for proposals prior to submitting.

Stay tuned for more announcements, follow @MountainWestDLS on Twitter, or check out the Mountain West Data Librarian Symposium Website. Please send any questions to MountainWestDLS@gmail.com. /da

Categories: Data Science

Big Data for Hospital Librarians – Are We There, Yet?

MAR Data Science - Tue, 2018-05-01 08:00

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by Elanor Pickens, Medical Librarian, Portsmouth Regional Hospital, Portsmouth, NH

“Big data” is a term that has been recently appearing in the literature of academic librarianship. However, as a hospital librarian, may I consider this justification enough to explore its applicability to my current position? The definition of big data is multi-faceted and covered elsewhere, and I will refer readers on (Gandomi & Haider, 2015). Here I simply seek to understand my potential role in this field.

Proposed by Martin (2016) is a framework of five basic categories, within which librarians may find at least one opportunity in supporting data science activities. One such role that the hospital librarian may engage in might be found within the “Literacy” domain, through teaching. This may include remaining informed about the various programming languages and software (R, Python, SAS, etc.), and their strengths and weaknesses, so that we may assist in the research process by educating about data management options. Additionally, we might lead researchers to resources that can help them visualize their data, to enhance their own comprehension of relationships and to enable them to better present their findings to others. If we also understand the form of data that they are collecting (structured, semi-structured, or unstructured), we may be able to help them discover studies that have utilized similar data so that they may anticipate any barriers of organization and analysis that they might encounter.

An example of a concern that researchers may have with respect to big data analysis is that a question needs to be relatively clear before the examination of any data (Brennan). Although slight revisions may be necessary, or new questions might arise, which could be further examined, there should not really be a significant change in direction of the original question. There are too many data and methods of analysis to proceed in big data research without a clear understanding of where it leads. Librarians are proficient at refining questions in order to get at the core of a research query as part of their reference interview skillset. We also have at least some basic knowledge of the different types of data analysis methods that are described in the literature, although much of this exposure does not include research conducted with big data. Iwashyna and Liu (2014) state that, in contrast to the formulation of one hypothesis and advance selection of the data that will need to be carefully collected to support or refute this hypothesis (traditional epidemiology), the multitude and variety of big data are able to be custom fit to epidemiological studies to identify patterns. Similarly, the authors discuss that a number of analytic methods can be adapted and combined when using big data, whereas traditional epidemiology is usually restricted to one analytic method per study. For hospital librarians, an awareness of research question modifications and understanding of methods of data analysis may not necessarily yield additional support to experienced researchers, but it may still help guide those individuals who are subject specialists in their field but new to the research process.

Groeneveld and Rumsfeld (2015) note that big data certainly has predictive power in clinical decision-making; however, big data cannot determine which associations are due to random events, nor can they identify causal associations. In addition, the authors point to a lack of large-scale analytical methodology for scientific comparison studies comparable to the level currently available in big data analysis. It is necessary for researchers, especially those seeking publication, to consider the reproducibility of their studies when they are using such highly adaptive and dynamic models of analysis. In this case, there may be little more for hospital librarians to do than to continue to assist researchers in discovering the current best practices for big data publication with respect to transparency.

For the solo hospital librarian, who is often juggling multiple tasks, big data may simply not be a sphere in which to operate. I feel that despite having gained a useful surface understanding of big data concepts, I am still unable to determine how I may be able to apply this knowledge in my current position, or whether I would even have the time to do so. And without a clearly-defined path, management support for professional development opportunities is essentially non-existent (Burton, 2017). One area of interest that has piqued my curiosity is how big data may inform my organization’s operations through the revision of protocols, thereby improving clinical practice. I have never really had much opportunity before to consider how data collected by our EHR truly inform patient care, and especially how they might impact revenue. For example, patients who come in for vaccinations may also receive additional preventive care (Kaelber, 2016). But in order to actually delve into the big data arena, it may be up to each individual librarian to either maintain a basic awareness, or seek out opportunities that may or may not be supported at the organizational level.

REFERENCES:

Burton, M., & Lyon, L. (2017). Data science in libraries. Bulletin of the Association for Information Science and Technology, 43(4), 33-35.

Brennan, P. (2015). Big Data in Nursing Research. NINR Big Data Boot Camp Part 4: Big Data in Nursing Research. https://www.youtube.com/watch?v=KOFLQ5z05f8

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137-144.

Groeneveld, P. W., & Rumsfeld, J. S. (2016). Can big data fulfill its promise? Circulation: Cardiovascular Quality and Outcomes, 9(6), 679-682. PMCID: PMC5396388.

Iwashyna, T. J., & Liu, V. (2014). What’s so different about big data? A primer for clinicians trained to think epidemiologically. Annals of the American Thoracic Society, 11(7), 1130-1135. PMCID: PMC4214055.

Kaelber, D. (2016). Using Clinical Data to Improve Clinical Patient Outcomes. NNLM Forum (online). http://www.kaltura.com/tiny/7e5k7

Martin, E. R. (2016). The Role of Librarians in Data Science: A Call to Action. Journal of eScience Librarianship, 4(2), 7. https://escholarship.umassmed.edu/jeslib/vol4/iss2/7/

Categories: Data Science

Pages