National Network of Libraries of Medicine
English Arabic Chinese (Simplified) French Hindi Japanese Korean Persian Portuguese Russian Spanish

PNR Data Science

Subscribe to PNR Data Science feed
News from the Northwest and Beyond
Updated: 1 hour 40 min ago

Data, Data Everywhere and Not a Drop to Drink

Fri, 2018-04-20 05:00

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Jeff Durham, Medical Librarian, Desert Regional Medical Center, Palm Springs, CA

We swim in a sea of information; more often than not we are drowning in it. When a person is presented with a smorgasbord of data, how do we determine what we should eat? This is the current situation with regards to big data and healthcare. What data should be utilized and how. It is in this data-centric meal that the data-savvy health science librarian should be most at home: as critic, guide, and chef.

As health science librarians, we have a responsibility to not only provide the communities that we serve with access to up-to-date and accurate information, but also must be available to enable and facilitate the informational needs of researchers in our communities. With the tremendous amount of big data that is generated on a daily basis, health science librarians have a duty to become involved and assist all of their patrons, both lay and professional, to access, extract, and manage the data (both big and small) that they need.

There are barriers to making a librarian into a data-savvy librarian who can tackle big data problems with ease. One barrier is that many graduate schools in library and information science have not been as keen to teach data science in a general education format, preferring to see it more as a sub-specialty. This occurs ironically enough in iSchools as well. While there is a growing trend to change this educational oversight, it is not the dominant paradigm yet. Another barrier is that of opportunity. All too often, the librarian simply does not have the time or their employer does not provide the means (e.g. time off, reimbursement) for the librarian to refresh their skill set. Until library managers and directors see the value of continuing education of the librarians on their staff on how to use data science and work with big data, the health sciences librarian will continue to fall behind.

There are also opportunities to be found. In hospitals and health science libraries, with residents and medical students, there are lots of in-roads for librarians to make. Given the exponential growth in big data that biomedical devices and the prevalence of smart devices which are constantly generating both passive and active data there is a lot of big data to utilize. The data that is being produced has the potential to be used in research projects for students, residents, nurses, and doctors on staff. There is a significant gap between the abilities of these medical professionals and that of data science. The role of the data-savvy librarian is to be a bridge between these gaps. The data-savvy librarian is able to assist their patrons in identify the datasets that they need as well as demonstrating how to wrangle, clean and visualize their data. By doing this, the librarian provides an essential role in the medical field. It is through the management of big data and assisting the researcher with working with the data and discerning patterns and trends that the librarian enables the student, nurse, or clinician to make evidence-based decisions on the data. By doing so, the librarian assists not only the informational needs of the researchers, but also has a very real impact on patient care.

Categories: Data Science

Reflections on Big Data in Healthcare: Exploring Emerging Roles

Mon, 2018-04-16 17:55

In the NNLM Big Data in Healthcare: Exploring Emerging Roles course, we asked participants, as they progressed through the course to consider the following questions: Do you think health sciences librarians should get involved with big data in healthcare? Where should librarians get involved, if you think they should? If you think they should not, explain why. You may also combine a “should/should not” approach if you would like to argue both sides. NNLM will feature responses from different participants over the coming weeks.

Written by: Kathleen Carlson, Education Librarian, College of Medicine Phoenix, University of Arizona, Phoenix, AZ

It is essential for the future of medical librarians to get involved in Big Data. Much of our future work will be coming from big data research projects, especially librarians that work in hospitals and health care systems. Since librarians were early adopters of technology, we were able to move from print indexes to searching indexes on CD-ROMs that were eventually moved to the Web. Moving from the card catalogue to integrated automated library systems, librarians understand how important it is to move forward with Big Data. Many of the older, experienced librarians may not have the expertise or training in the fields of math, computational skills, statistics and domain expertise but we know that our profession should be part of our institutions Big Data team and at least have a seat at the table.

I know that being an Assistant Professor of Practice in the Department of Biomedical Informatics (BMI) at my academic institution, has allowed me to understand and speak the language of Big Data. Clinicians will come to me for resources and journal articles and I have learned a lot by attending monthly journal club meetings on different subjects of Biomedical Informatics and Big Data. BMI fellows, Chief Medical Information Officers (CMIO,) Chief Nursing Officers (CNO) of area hospitals, and BMI faculty attend the sessions. Here I have an opportunity to be seen and be heard and ask questions when they arise as a non-clinician. We have covered the following topics of Big Data and Informatics in the past three years:

  • Cybersecurity
  • Data Standards
  • Health Literacy
  • Electronic Health Record/Electronic Patient Record
  • Process Oriented Health Information Systems
  • Clinical Decision Support Systems
  • Graphic Display and Visualization
  • Health information Exchange
  • Cloud Computing Services
  • Substitutable Medical Applications and Reusable Technologies (SMART)
  • Fast Health Interoperability Resources (FHIR)

I also attend monthly Clinical Informatics Grand Rounds. The speakers vary from clinicians to researchers, MBA, Pharmacy and Public Health faculty.

So, for the past three years I have had a seat at the table and have given our library visibility within Biomedical Informatics and Big Data. I also believe that a medical librarian at any institution should find a champion or champions that will assist him/her in getting a seat at the table. And when that is accomplished, a hospital librarian should get permission to embed at least one vetted  link that is appropriate to a patient’s electronic record with,  National Institute on Aging, or another consumer health oriented resource. This would relieve the burden on clinicians in finding the best resource for patient care.

Big Data can be organized, appraised, secured, preserved with a librarian’s help and can assist researchers and clinicians in patient care and help find areas that may need improvement. Creating an online resource guide with Big Data tools and resources can be a first step into marketing the librarian and library. The NNLM PSR had recently recruited a data and technology services coordinator. She asked librarians if they collected any data for their institution. Unfortunately, we are considered a satellite campus of a large Research One University. I think there are areas at my institution where data is collected but could be used more effectively. I know within the Scholarly Project, a four-year mandatory thesis and poster at our institution, many of our students use Big Data from area hospitals or the state’s data archives to have foundational information in their presentations and theses. They are assisted by their clinical mentors.

I also like one of my fellow course student’s discussion post about teaching himself ‘R’ so he is able to teach classes to the data scientists on his campus. Finding resources for Big Data programming language and free software for statistical computing and graphics software like ‘R’ and can help the librarian be an informational resource for Big Data collection. This instruction example is one-way librarians will have to get out of your comfort zone and put themselves out there for Big Data. We have access to SPSS and STATA in our library commons. I took three classes on RedCap to help me understand Big Data and how to collect it safely and securely. REDCap is a secure web application for building and managing online surveys and databases and collecting data.

The librarian can be the go-to resource for students and researchers and help them search the archives of stored Big Data sets. I do not believe that our small campus has the capacity to store Big Data and it is not something that the larger academic institution is willing to duplicate. I do believe that as a librarian being visible and attending committee meetings, journal clubs, clinical informatic rounds and actually showing an interest in learning about Big Data gives a librarian the knowledge and vocabulary to understand and share with her constituents. The librarian can also familiarize himself/herself with websites that assist in Big Data knowledge similar to the Institute for Health Metrics and Evaluation which I learned about in the course discussions.

Categories: Data Science

DataFlash: Data Indexers

Mon, 2018-04-02 13:50

The Institute for Health Metrics and Evaluation (IHME) is “an independent population health research center at UW Medicine, part of the University of Washington, that provides rigorous and comparable measurement of the world’s most important health problems and evaluates the strategies used to address them.” Their mission is to improve the health of the world’s populations by providing the best information on population health, and to do so, IHME enlists the expertise of countless individuals, including researchers, data analysts, data scientists, and thirteen data indexers. What is a data indexer? Lyla Medeiros, a data indexer at IHME, shares more about her essential role below…

What is a data indexer? And how long have you been in the role?

Data indexers are part of a team responsible for providing librarian services to IHME. Data indexers not only catalog data for inclusion in the Global Health Data Exchange (GHDx), they also organize and maintain data files, provide reference services to IHME researchers, and search for and acquire new data sources. Data indexers are also responsible for creating documentation on cataloging practices, implementing improvements to process and workflows, reporting and testing technical issues that pop up in the GHDx for the Drupal development team, and managing controlled vocabularies and taxonomies, which includes researching and adding terms. I’ve been working as a data indexer for four years and three months.

What is your education/occupational background?

I earned a BA in Dance Studies and Art History at the State University of New York, Empire State College and a Masters of Library Science at Indiana University, Bloomington. Before becoming a librarian, I trained to become a classical ballet dancer and teacher. I’ve taught ballet in New York, New Mexico and here in Washington.

Who do you work with at IHME?

Outside of the data services team, I work with public health researchers, data analysts, Drupal developers, and student assistants.

IHME US Map Data Visualization

IHME US Map Data Visualization

What types of data do you work with?

The data that IHME uses to create global health estimates comes in data file formats like .dta, .dbf, .sav, and Excel tables, Word documents, text files, .pdf documents and Access databases. When necessary, we digitize books and sometimes even microfiche. Right now, I primarily catalog health and demographic survey datasets and their related geospatial data. In the past, I’ve also worked on cataloging health statistics reports, epidemiological surveillance, and serial publications. Some other types of data we collect and catalog include vital registration, hospital discharges, censuses, disease registries and government health budgets.

What do you enjoy most about your job?

I most enjoy the variety of work. For example, today I did research on stroke in order to create new keywords and planned out how to retroactively apply the new keywords to existing records, searched for and cataloged new survey data, contacted a survey provider about missing variables in a data file, and worked on a presentation I’ll be giving to on our keyword taxonomy.

What advice would you give other librarians interested in working with data/in the field of data librarianship? 

I am forever thankful for the classes I took in graduate school that focused on representation and organization, metadata and semantics, indexing, creating ontologies in RDF/RDFs (Resource Description Framework/Resource Description Framework Schema) and cataloging in XML. Those classes provided me with a solid foundation for the type of work I do as a data indexer.

I would like to sincerely thank Lyla for providing us with insight into a librarian role that is quite unique, and quite essential. If you would like to learn more about IHME, the GHDx, and many of their ground breaking projects and visualizations, please visit

Categories: Data Science

Data Flash: Exploring Historical Data

Tue, 2018-03-27 04:00

It’s so easy to think of data as a modern phenomenon, that we forget that data analysis and data visualization are phenomena which go way back.  A marvelous example is John Graunt’s Bills of Mortality, which this post by John Appleby calls “a 17th century spreadsheet of deaths in London”.  Appleby goes on to do some ultra-modern visualizations of the data, which illuminate connections that Graunt probably didn’t make at the time scientifically, but may have understood intuitively.

If you find this concept intriguing, either to read about or to explore more directly, consider taking a mosey through the series from the National Library of Medicine’s Historical Division, “Revealing Data: Explorations of Data in Collections”.  They are making historical research data available now, across many health-related fields, and in fascinating ways!   They have data in a wide range of formats, informally and formally collected, quantitative and qualitative.   And among their treasures is—you guessed it—a copy of Graunt’s magnum opus, and a post about it!

Also, if data analysis is your thing, there are many sources of data sets out there, particularly from the federal government.  Check out descriptions of what’s in the National Archives, or historical patent data, or that classic, historical census data, and others at   Additionally, you might want to explore data from other sources, such as from the Pew Research Center (it doesn’t go back to the 1600s, but it’s something!), and historical GIS data from the American Association of Geographers.

Enjoy your explorations into the past—which may end up transforming our future!

Categories: Data Science

NLM’s 2017-2027 Strategic Plan

Fri, 2018-03-16 15:56

“Every day more than four million people use NLM resources; every hour, a petabyte of data moves in or out of our computing systems.”

The National Library of Medicine Board of Regents has released the Strategic Plan for 2017-2027: “A Platform for Biomedical Discovery and Data-Powered Health.” Working in conjunction with NLM planning staff, the Board identified themes to use as a framework to develop future priorities and directions. Public input was solicited on these themes:

  • Advancing data science, open science, and biomedical informatics
  • Advancing biomedical discovery and translational science
  • Supporting the public’s health: clinical systems, public health systems
    and services, and personal health
  • Building collections to support discovery and health in the 21st century

In addition, the following topics were considered across the four themes: partnerships, user communities, user engagement and educational outreach, international engagement,health disparities, standards, infrastructure, workforce development, research needs and funding.

The Strategic Plan introduces three major goals:

  1. Accelerate discovery and advance health by providing the tools for data-driven research.
  2. Reach more people in more ways through enhanced dissemination and engagement pathways.
  3. Build a workforce for data-driven research and health.

From Dr. Patricia Brennan’s introduction:

  • Data-driven discovery requires sophisticated library and information science to open the door to thrilling new prospects for improving the public health, as well as informatics and data science to deliver insights and solutions.
  • The migration of clinical care from hospital to home challenges NLM to reach into these places where health occurs, not just where care is delivered.
  • Governmental and scientific forces are aligning under a philosophy that innovation is accelerated if data flow freely, that the results of government-sponsored research should be open to the public as quickly as possible, and that linking scientists, citizens, and industry yields social benefits.
  • Libraries continue to be essential places for knowledge repositories and community gathering, yet the advent of self-directed search, e-publishing and consolidation of hospital library services challenges librarians and libraries to devise new services and solutions.

Your comments on the Strategic Plan are welcome. Please send to: NLM Director’s Office (

Categories: Data Science

DataFlash: Love Data Week Poetry

Thu, 2018-02-15 18:35

For your Love Data Week poetry enjoyment…

Life is filled with normal distributions
But you are an outlier
You skewed my mean(ing)
Showing me a world beyond average
Somehow helping me find my significance
And yet still bring me back to center

For researchers loving their data,
There are no divisions or strata!
Their passion is strong
As the grip of King Kong,
And their tables and charts even greata!

Your still have time to submit your Love Data Week poem! Post it to @NNLM_RD3 and let the world know how much you love data.

Categories: Data Science

DataFlash: Data Horror Stories

Thu, 2018-02-15 18:00

In the spirit of Love Data Week’s 2018 theme, Data Stories, it’s important to consider cautionary tales as well as good outcomes. We should, after all, learn from our mistakes. Perhaps the best known collection of data horror stories is Dorothea Salo’s Research Data Management Horror Stories pinboard. Dorothea, a University of Wisconsin academic librarian and library-school instructor, has been pinning data tales of woe since 2010.

We probably all have our own personal examples of data hell, but here are a few of my favorite themes…

  • Submitting a grant proposal and neglecting to include a well thought out data management plan. Proposal rejected. Research flat lines.
  • Gathering your identifiable biomedical data without adequately consulting with your Institutional Review Board (IRB). Collateral damage.
  • Neglecting to develop and implement a detailed naming convention for your data files. Data hot mess.
  • Neglecting to maintain thorough metadata for your datasets, models, and algorithms. Data bedevilment.
  • Failing to sync and back up your data in three separate locations. Deadly data loss.
  • Dismissing the need to follow guidelines to insure the security of your data. Data access nightmare.
  • Saving your data to a proprietary file format that is on the verge of insolvency. Walking dead data.
  • Disregarding the need to place your data in an appropriate repository that provides long term access and maintenance. Evil dead data.
  • Facing the shame of having your publication retracted due to data irregularities. The horror. The horror.

For the love of all things organized, don’t let your researchers be condemned to these nine circles of data hell!

Categories: Data Science

February PNR Rendezvous webinar- decolonizing data

Wed, 2018-02-14 10:47

Join us for our next PNR Rendezvous, “Hope From Our  Grandmothers: Decolonizing Data Through Stories of Resilience”

When: Wednesday, February 21, 1:00pm PT, Noon Alaska Time, 2:00pm MT

Much research has been historically rooted in controlling American Indian and Alaska Native (AI/AN) and other indigenous peoples to exploit land and natural resources, or even heredity and group identity. Yet, AIAN community ties, tribal sovereignty rights and claims, and cultural values are emerging as critical elements of resiliency key to reversing the very health and social issues that have plagued indigenous populations as a whole since the dawn of colonization. The practice of research and utilizing information collected by means of observation, hypothesis-testing, repetition of experiment and sound conclusions to inform decision-making, have been integral to indigenous survival and well being for centuries. This webinar will review some of the modern scientific values in comparison to AIAN ways of knowing and provide examples of indigenous research concepts as they align with decolonizing data.

Speaker: Rose James, PhD (Lummi), Director of Evaluation and Research for the Urban Indian health Institute

The session qualifies for 1 MLA (Medical Library Association) CE credit whether attending the live session or watching the recording.

Registration is encouraged though not required. Register and learn how to join the session

Categories: Data Science

DataFlash: How DO You Tell a Story With Data Anyway?

Tue, 2018-02-13 04:00

As you may know, it’s “Love Data Week”! Formerly “Love Your Data Week”, this is the third year of this international data extravaganza, which seeks to “raise awareness and build a community to engage on topics related to research data management, sharing, preservation, reuse, and library-based research data services.”   The theme this year is “Data Stories”, which leads to the important question, “How DO you tell a story with data anyway?”

There are many sites that will help you get started on this quest; here, here and here are some good ones.  But, the main prerequisite is to KNOW YOUR DATA.   You can’t tell a great story about someone else’s vacation, right?  Similarly, to tell a good story with your data, you need to know where it comes from and how it was gathered, what question it’s meant to answer, and how it may lead you astray (if your random sample only comes from people with landlines, you may have a LEETLE bias).

Once you know your data, LOOK FOR CONNECTIONS (like the girl in the picture!).  Does a certain pattern seem to emerge if you look at two variables in conjunction with each other?  Does one item seem to predict another?  How does the data compare with other reports you can find in journals or online or among reports from colleagues?  Remember, the data don’t necessarily need to be quantitative; qualitative and other types of data can tell stories too!

Then, once the connections start to emerge, SHARE YOUR LOVE!  There are so many great tools to help you make the story take a shape that allows others to read/absorb it!  For example, Tableau has a public version that is free, Excel actually has quite a bit of capability for even sophisticated visualization, and some free online infographic platforms such as Piktochart can be very user friendly.

Be sure to check out the “Love Data Week” site, and some of the links below as well.  And let us know if you have ideas for how we can help you tell better stories!

–“Best Starts for Kids” Data Storytelling webinar

–Data Storytelling Resources, Part 1 and Part 2

–NIH LibGuide on Scientific Communication and Data Storytelling (with slides and handout!)

Storytelling with Data and related resource list

Photo credit: Diane Hammerton on Flickr
Categories: Data Science

DataFlash: Love Data Week 2018 (Monday, 2/12/2018 – Friday, 2/16/2018)

Fri, 2018-02-09 14:31

Keep Calm and Love DataHappy Love Data Week! This year’s theme is data stories. Formally known as Love Your Data Week, this social media event has been going on since 2016 when a group of enterprising research data specialists decided it would be a great way to raise awareness about research data management. Over the past two years, Love Data Week has grown to include participants in Europe, Asia, and Australia, and shows no sign of slowing down. The event is marked by Tweets, Facebook posts, Blog posts, webinars and live events, and this year’s theme is the broadest yet. Data stories is envisioned as an entry point for conversations about how data is being uses to shape the world around us. Sub-topics include stories about data, telling stories with data, connected conversations, and we are data.

If you or your organization would like to participate in Love Data Week, you will find a wealth of information on the website. Send your data a valentine and get involved!

Categories: Data Science

All of Us Research Program: Share Your Ideas

Wed, 2018-01-31 19:11

The All of Us Research Program is seeking input from the public that will help identify key research priorities and requirements (such as data types and methods) for future versions of the All of Us protocol. The program aims to build one of the largest, most diverse datasets of its kind for health research, with one million or more volunteers nationwide, who will sign up to share their information over time. Researchers will be able to access participants’ de-identified information for a variety of studies to learn more about the biological, behavioral, and environmental factors that influence health and disease. Their findings may lead to more individualized health care approaches in the future.

To contribute research ideas, see IdeaScale website. Responses are being accepted until February 23, 2018. IdeaScale gives the opportunity to see what others are adding, and to vote on your favorite ideas. Your input will be considered at a Research Priorities Workshop in March 2018 and ultimately help us build out the All of Us research platform with the tools needed to answer those questions. This is a unique opportunity to share research ideas with the All of Us Research Program. NLM appreciates your contributions.

Categories: Data Science

DataFlash: Electronic Lab Notebooks

Tue, 2018-01-30 07:00

I recently had the opportunity to listen to a webinar about electronic lab notebooks called, “Using electronic laboratory notebooks in the academic life sciences: a group leader’s experience on how they can make research teams more efficient.” Not having used electronic laboratory notebooks (eLN) myself, I was curious about their functionality, whether librarians are involved in their adoption, and whether they contribute to better data management. Paper-based laboratory note books have been used by researchers for centuries as a way to document their observations, experiments and procedures. A perfect example is Alexander Graham Bell’s lab notebook documenting the discovery of the first working telephone.

Photo of Alexander Graham Bell's lab notebook describing the first workable telephone.

Alexander Graham Bell’s Lab Notebook (Image is in public domain)

And many researchers still use paper-based products to document their work. Within the last decade, however, a wide range of electronic lab notebooks have become available. While some researchers choose to use simple, and inexpensive tools like Microsoft Word, Microsoft OneNote, or Evernote, there are products specifically designed for the research laboratory. Some of the more highly rated eLNs suitable for all scientific fields include Labfolder, SciNote, and LabArchives. Other products are designed to meet the requirements of a particular area of research. For example, LabGuru is designed for the life sciences while BIOVIA was developed for chemistry. Prices range from free and open source to proprietary and very costly, and choosing the right tool for the job can be quite challenging.

Fortunately, there are resources available to help researchers select the best eLN for their research team. Two recent articles on the topic include Ulrich Dirnagl, et al’s, “A pocket guide to electronic laboratory notebooks in the academic life sciences,” and Kanza, et al’s, “Electronic lab notebooks: can they replace paper?”. Franklin Sayre, Pharmacy Librarian at University of Minnesota Health Sciences Library is just one example of a librarian who is using his expertise to help researchers evaluate the features of eLNs. A number of LibGuides also exist on the topic, notably Purdue Libraries, University of Rochester Libraries, and the University of Utah Libraries.

Do eLNs lead to better data management? While most products include similar bells and whistles, including the ability to handle a wide variety of data formats, search images and text, audit records, time stamp entries, collaborate with others while controlling user permissions, back up and archive data, and comply with regulatory requirements, they all have one potential flaw. The user. While some of the issues differ—the cat can’t eat your eLN—many remain the same. The old saying, “garbage in, garbage out,” applies to both paper and electronic formats. File naming conventions, non-proprietary file formats, back up schedules, and other data management best practices still apply. Do you know which eLNs the researchers in your university or hospital are using? Asking them. Your conversation may lead to further collaboration.

Categories: Data Science

DataFlash: You Spoke and We Listened!

Tue, 2018-01-16 14:57

Happy New Year from your data gurus, Annie and Ann!   And welcome to the first post in our new blog series, DataFlash!   We both monitor a wide range of listservs, social media accounts, news sites and other troves of data information, and we’ll be sharing what we come across, biweekly.

As a starting place, remember that survey we did last year, asking for your ideas about what the NNLM-PNR should be doing to help with your data needs? (The one with the fabulous NLM tape measure prizes?)  We completed the analysis and presented the findings to our Executive Committee last November (as well as sending out the tape measures), and now we’re ready to release it to you!   Here it is!

Briefly, here’s what you told us:

–You want training in data literacy and how to help patrons/users with data

–You want us to help with developing and sustaining collaborations

–You want us to provide specific assistance with things like how to help researchers with workflow, and by creating templates to help with assessing and managing data

–You’d also like training opportunities around informatics

We’re planning several trainings already around research data management, and will be working towards offering more on the other topics you’re interested in.  And we’ll be contacting the folks who generously gave us their email addresses and said they’d be happy to chat further.    Watch this space for further developments, and please feel free to reach out if there are any ways we can help, if you need a consultation, have an idea for a training, or whatever!  And deepest thanks to everyone who participated in the survey—it’s great for us to have this information!

Categories: Data Science

From the UW eScience Institute: Data Science for Social Good

Wed, 2018-01-10 18:14

University of Washington eScience Institute Data Science for Social Good logo

Are you interested in using data-driven discovery for societal benefit? 

The University of Washington eScience Institute, in collaboration with the Cascadia Urban Analytics Cooperative, is excited to announce the summer 2018 Data Science for Social Good (DSSG) program. The program brings together Student Fellows with data and domain researchers to work on focused, collaborative projects for societal benefit.

Sixteen DSSG Student Fellows will be selected to work with academic researchers, data scientists, and public stakeholder groups such as government officials, academic researchers, non-profit organizations, and the general public, on data-intensive research projects.

Who: Graduate students and advanced (junior/senior) undergraduate students are invited to apply. Spring 2018 graduates are eligible for this program. Students who are not U.S. citizens or permanent residents are eligible to apply as long as their visa status allows them to work in the U.S. We cannot sponsor student visas for this program.

What: Each student will be part of a team working full-time on a research project that has concrete relevance and impact. Students are expected to work closely and collaboratively with team members onsite for the duration of the 10-week program. Projects will have an applied social good dimension and involve analysis and visualization of data from areas such as public health, sustainable urban planning, environmental protection, disaster response, crime prevention, education, transportation, governance, commerce, and social justice. Click for summaries of projects from the Summer 2015 and Summer 2016  and Summer 2017 DSSG programs.

Where: Most work will be conducted on the UW campus in the WRF Data Science Studio, but some field excursions in the City of Seattle or King County may also be involved.

When: This is a 10-week long, full-time program beginning June 11th and ending August 17th 2018.

Compensation: Students will be given a stipend of $6,500 for the 10 weeks.

Desired qualifications:

  • Demonstrated experience in issues related to social good
  • Research experience with quantitative or qualitative tools
  • Strong academic record
  • Previous programming experience

How to Apply: CLICK HERE FOR THE APPLICATION FORMPlease note: a copy of your CV and unofficial transcripts are required to complete the form.

Questions may be directed to

Application Deadline: February 12th at midnight Pacific Time

Categories: Data Science