Data Science is an interdisciplinary field which uses statistics, computer science, programming, and domain knowledge to collect, process, and analyze data for the purpose of acquiring knowledge or solving a problem. Data science also includes sharing acquired knowledge through storytelling, visualization, and other means of communication. Data science often employs methods such as machine learning, AI, natural language processing, algorithms, and other analytic tools to process and understand data.
An example of the use of data science would be creating a machine learning model that uses data from large amounts of Electronic Health Records (EHRs) to predict if patients are at a higher risk for readmission after hospital discharge.
Another example would be an AI or neural network that analyzes f millions of images of skin lesions and learns to predict which lesions are most likely to turn cancerous.
Data Science uses a variety of open source and proprietary software (e.g, Python and R). Tools include software and programming languages to build and run data science models, tools to extract, organize and clean data, and tools to visualize and share findings. The tools and processes used vary depending on institution, data needs, field, and skill set.
This brief article defines data science, its roots, and clarifies distinctions between data science and other buzzwords:
This article from IBM explains the basics of data sciences, it’s lifecycle, and some common tools: