This Nanodegree covers the concepts and tools you'll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.
Nanodegree Program Syllabus
Introduction to Data Science Using R Programming
In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples.
Getting and Cleaning Data
Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.
Data Visualization and Communication with Tableau, R & Watson Analytics
One of the skills that characterizes great business data analysts is the ability to communicate practical implications of quantitative analyses to any kind of audience member. Even the most sophisticated statistical analyses are not useful to a business if they do not lead to actionable advice, or if the answers to those business questions are not conveyed in a way that non-technical people can understand.
In this course you will learn how to become a master at communicating business-relevant implications of data analyses. By the end, you will know how to structure your data analysis projects to ensure the fruits of your hard labor yield results for your stakeholders. You will also know how to streamline your analyses and highlight their implications efficiently using visualizations in Tableau, the most popular visualization program in the business world. Using other Tableau features, you will be able to make effective visualizations that harness the human brain’s innate perceptual and cognitive tendencies to convey conclusions directly and clearly. Finally, you will be practiced in designing and persuasively presenting business “data stories” that use these visualizations, capitalizing on business-tested methods and design principles.
Practical Machine Learning
One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.
A final capstone project will be completed by each student in which the knowledge attained from all the courses will be applied collectively.
IBM, Microsoft, RStudio, Tableau
Tools Covered: R, RStudio, Microsoft Azure Machine Learning Studio, Tableau, IBM Watson Analytics