We wish to cordially invite you, your students, staff, and colleagues to a weekly webinar-based lecture series entitled the BD2K Guide to the Fundamentals of Data Science. This is a series of high-level didactic overviews across the range of topics important for data science, intended to provide a general biomedical audience with an appreciation of the elemental issues related to data science research and applications.
The series will be held each Friday at noon Eastern Time (9am Pacific) beginning September 9th, 2016. Please join from your computer, tablet or smartphone: https://attendee.gotowebinar.com/register/341938597813942273
You may also dial in using your phone.
- United States : +1 (914) 614-3221
- Access Code: 736-335-403
Registration is not required. Bookmark the webinar link for easy access to our weekly event!
Our initial set of confirmed data science lecturers includes: Mark Musen (Stanford), William Hersh (Oregon Health Sciences), Lucila Ohno-Machado (UCSD), Michel Dumontier (Stanford), Zachary Ives (Penn), Suzanne Sansone (Oxford), Chaitan Baru (NSF), Brian Caffo (Johns Hopkins), and Naomi Elhadad (Columbia).
This series is sponsored by the NIH Office of the Associate Director for Data Science, the Big Data to Knowledge (BD2K) Training Coordination Center, and the BD2K Centers Coordination Center. A dedicated webpage with additional information, the complete schedule of speakers, and a collection of all the recorded lectures is forthcoming and will be available shortly. In the meantime, the data science topics to be covered by our incredible set of speakers are as follows:
Introduction to big data and the data lifecycle
Section 1: Data Management Overview
- Finding and accessing datasets, Indexing and Identifiers
- Data curation and Version control
- Metadata standards
Section 2: Data Representation Overview
- Databases and data warehouses, Data: structures, types, integrations
- Social networking data
- Data wrangling, normalization, preprocessing
- Exploratory Data Analysis
- Natural Language Processing
Section 3: Computing Overview
- Programming and software engineering; API; optimization
- Cloud, Parallel, Distributed Computing, and High Performance Computing
- Commons: lessons learned, current state
Section 4: Data Modeling and Inference Overview
- Smoothing, Unsupervised Learning/Clustering/Density Estimation
- Supervised Learning/prediction/Machine Learning, dimensionality reduction
- Algorithms and their Optimization
- Multiple hypothesis testing, False Discovery Rate
- Data issues: Bias, Confounding, and Missing data
- Causal inference
- Data Visualization tools and communication
- Modeling Synthesis
Section 5: Additional topics
- Open science
- Data sharing (including social obstacles)
- Ethical Issues
- Extra considerations/limitations for clinical data
- SUMMARY and NIH context
Section 6: Specific examples
We hope that you will enjoy this exciting and informative series of lectures on data science. Again, please instruct your students, staff, and colleagues to tune in. Share this announcement with others, too. We look forward to having you!