Machine Learning: Regression | Machine Learning Specialization | Coursera

Brief Information

Name : Machine Learning:?Regression
Lecturer 😕Carlos Guestrin?and?Emily Fox
Duration: 2015-12-28 ~ 2016-02-15 (6 weeks)
Course : The 2nd(2/6) course of Machine Learning Specialization?in Coursera
Syllabus
Record
Certificate
Learning outcome
- Describe the input and output of a regression model.
- Compare and contrast bias and variance when modeling data.
- Estimate model parameters using optimization algorithms.
- Tune parameters with cross validation.
- Analyze the performance of the model.
- Describe the notion of sparsity and how LASSO leads to sparse solutions.
- Deploy methods to select between models.
- Exploit the model to form predictions.
- Build a regression model to predict prices using a housing data set.
- Implement these techniques in Python.

Syllabus ↑

Week 1 |?Simple Linear Regression

Welcome

Welcome!
What is the course about?
Outlining the first half of the course
Outlining the second half of the course
Assumed background

Simple Linear Regression

What is this course about?
Regression fundamentals
The simple linear regression model, its use, and interpretation
An aside on optimization: one dimensional objectives
An aside on optimization: multidimensional objectives
Finding the least squares line
1. Approach 1: Set gradient = 0
2. Approach 2: Gradient descent
3. Comparing the two approaches
Discussion and summary of simple linear regression
1. Influence of high leverage points
2. High leverage points
3. Influential observations
Programming assignment

Quiz: Simple Linear Regression
1. Q&A
2. interval, estimation, inverse estimation, unit change
Quiz: Fitting a simple linear regression model on housing data
1. A programming assignment
2. 2 different models respectively by square feet and #bedrooms

Week 2 |?Multiple Regression

Multiple Regression

Multiple features of one input
1. Multiple regression intro
2. Polynomial regression
3. Modeling seasonality
4. Where we see seasonality
5. Where we see seasonality
6. Regression with general features of 1 input
Incorporating multiple inputs
1. Motivating the use of multiple inputs
2. Defining notation
3. Regression with features of multiple inputs
4. Interpreting the multiple regression fit
Setting the stage for computing the least squares fit
1. Optional reading: review of matrix algebra
2. Rewriting the single observation model in vector notation
  1. Multiple regression by using matrices
3. Rewriting the model for all observations in matrix notation
  1. Multiple regression by using matrices
4. Computing the cost of a D-dimensional curve
  1. RSS of a D-dimensional curve
Computing the least squares D-dimensional curve
1. Computing the gradient of RSS
2. Approach 1: closed-form solution
  1. Analogy by 1 dimension .
3. Discussing the closed-form solution
  1. O(n^3): computationally intensive solution.
  2. There exist less intensive algorithms for the closed-form solution exist but the gradient descent is less intensive.
4. Approach 2: gradient descent
  1. Just replace $\nabla RSS(w^{(t)})$ as $-2 H^{T} (y - Hw )$
5. Feature-by-feature update
6. Algorithmic summary of gradient descent approach
Summarizing multiple regression
1. A brief recap
2. Quiz: Multiple Regression
Programming assignment 1
1. Reading: Exploring different multiple regression models for house price prediction
2. Quiz: Exploring different multiple regression models for house price prediction
Programming assignment 2
1. Numpy tutorial
2. Reading: Implementing gradient descent for multiple regression
3. Quiz:?Implementing gradient descent for multiple regression

Quiz: Multiple Regression
Quiz: Exploring different multiple regression models for house price prediction
Quiz: Implementing gradient descent for multiple regression

Week 3 |?Assessing Performance

Assessing Performance

Defining how we assess performance
3 measures of loss and their trends with model complexity
3 sources of error and the bias-variance trade-off
1. Irreducible error and bias
  1. 3 sources of error: Noise, bias, variance
  2. Noise is caused by neglected sources of the prediction.
  3. Noise: Irreducible error
  4. $Bias(x) = f_{w(true)}(x) - f_{w(average)}(x)$
  5. $f_{w(average)}(x) = \frac{1}{N} \sum_{n=1}^{N} f_{w(trainingdata)}$
  6. Low complexity ⇒ high bias
  7. High complexity ⇒ low bias
2. Variance and bias-variance trade-off
  1. Low complexity ⇒ low variance
  2. High?complexity ⇒ high?variance
  3. Bias-variance trade-off
    1. Low complexity ⇒ high bias AND?low variance
    2. High?complexity ⇒ low bias AND high?variance
  4. Finding the sweet spot that complexity satisfies low bias and low variance
    1. MSE: Mean Squared Error
    2. $MSE = \frac{1}{N} \sum_{n=1}^{N} \sqrt{Bias^2 + Variance}$
  5. We cannot compute bias and variance because both contain the true function,?which cannot be computed.
3. Error vs. amount of data
  1. For a fixed model complexity
  2. #(data points in training set) increases?⇒ training error increases
  3. #(data points in training set) increases?⇒ true error increases
  4. #(data points in training set) → ∞?⇒ [training error = true error]
OPTIONAL ADVANCED MATERIAL: Formally defining and deriving the 3 sources of error
1. Formally defining the 3 sources of error
2. Formally deriving why the 3 sources of error
Putting the pieces together
1. Training/validation/test split for model selection, fitting, and assessment
  1. Hypothetical implementation
    1. Data set = (training set) + (test set)
  2. Practical implementation
    1. Data set = (training set) + (validation set) + (test set)
2. A brief recap
3. Quiz: Assessing Performance
Programming assignment
1. Reading: Exploring the bias-variance trade-off
2. Quiz:?Exploring the bias-variance trade-off

Quiz: Assessing Performance
Quiz: Exploring the bias-variance trade-off
1. Construction of polynomial regression using the linear regression function of graphlab.
2. We can construct any polynomials using the linear combination by setting features as the powers of inputs.
3. If the degree of the polynomial is too large.
4. train_data : validation_data : test_data = 45 : 45 : 10
5. The polynomial model is fitted on train_data.
6. The RSS is computed on validation_data.
7. Assessment is done on test_data.
8. Choose the degree of the polynomial makes?the RSS(Residual Sum of Squares) on validation_data minimal among the candidate degrees.

Week 4 |?Ridge Regression

Ridge Regression

Characteristics of over-fit models
1. Symptoms of overfitting in polynomial regression
2. Overfitting demo
3. Overfitting for more general multiple regression models
The ridge objective
1. Balancing fit and magnitude of coefficients
 1. [measure of fit] ↘ ⇒ [good fit to training data]
 2. [measure of magnitude of coefficient]?↘ ⇒ [not overfit]
 3. [total cost] = [measure of fit] + [measure of magnitude of coefficient] = [RSS] + $\sum_{j=0}^{D} \left \| \mathbf{w} \right \| _{j}^{2}$
2. The resulting ridge objective and its extreme solutions
 1. Select $latex \mathbf{\hat{w}}$??to minimize the total cost $C_{total}$
 2. $latex RSS(\mathbf{\hat{w}}) + \lambda \left \| \textbf{w} \right \|_{2}^{2}$
 3. $\lambda = 0 \Rightarrow C_{total} = RSS(\mathbf{\hat{w}})$
 4. $\lambda = \infty \Rightarrow \mathbf{\hat{w}} = 0?\Rightarrow C_{total} = 0$
3. How ridge regression balances bias and variance
 1. $\lambda_{1} < \lambda_{2} \Rightarrow Variance_{1} < Variance_{2}$
 2. $\lambda_{1} < \lambda_{2} \Rightarrow Bias_{1} > Bias_{2}$
4. Ridge regression demo
 1. Underfit ↔ overfit
 2. “Leave One Out(LOO)” cross validation: the algorithm that chooses the tuning parameter, lambda $\lambda$
5. The ridge coefficient path
 1. Coefficient path
Optimizing the ridge objective
1. Computing the gradient of the ridge objective
 1. $ RSS(\textbf{w}) + \lambda \left \| \textbf{w} \right \|_{2}^{2} $
 2. $ \left \| \textbf{w} \right \|_{2}^{2} = \textbf{w}^T \textbf{w} $
 3. $ \textbf{w} = (w_1\ w_2\ w_3\ ...\ w_D)^T $
 4. $ RSS(\textbf{w}) + \lambda \left \| \textbf{w} \right \|_{2}^{2} $
 $ = (\textbf{y}-\textbf{Hw})^{T}(\textbf{y}-\textbf{Hw}) +\lambda \textbf{w}^T \textbf{w} $
 5. $ \nabla [RSS(\textbf{w}) + \lambda \left \| \textbf{w} \right \|_{2}^{2}]\\ $
 $ = \nabla [(\textbf{y}-\textbf{Hw})^{T}(\textbf{y}-\textbf{Hw})] +\lambda \nabla [\textbf{w}^T \textbf{w}]\\ $
 $ = -2 \textbf{H}^T(\textbf{y}-\textbf{Hw}) + 2 \lambda \textbf{w} $
 6. COST
 $ \nabla cost( \textbf{w} )\\ = -2 \textbf{H}^T(\textbf{y}-\textbf{Hw}) + 2 \lambda \textbf{w}\\ =-2 \textbf{H}^T(\textbf{y}-\textbf{Hw}) + 2 \lambda \textbf{I} \textbf{w} $
 7. Ridge closed form solution
 $ \nabla cost( \textbf{w} ) = 0\\ \Leftrightarrow \mathbf{H}^T \mathbf{H} \mathbf{\hat{w}} + \lambda \mathbf{I} \mathbf{\hat{w}} = \mathbf{H}^T \mathbf{y}\\ \Leftrightarrow (\mathbf{H}^T \mathbf{H} + \lambda \mathbf{I})\mathbf{\hat{w}} = \mathbf{H}^T \mathbf{y}\\ \Leftrightarrow \mathbf{\hat{w}} = (\mathbf{H}^T \mathbf{H} + \lambda \mathbf{I})^{-1} \mathbf{H}^T \mathbf{y} $
2. Approach 1: closed-form solution
3. Discussing the closed-form solution
4. Approach 2: gradient descent
Tying up the loose ends
1. Selecting tuning parameters via cross validation
  1. How to choose the tuning parameter $\lambda$
  2. K-fold cross validation
  3. How to handle the intercept
  4. A brief recap
Programming Assignment 1
Programming Assignment 2

Quiz: Ridge Regression
Quiz: Observing effects of L2 penalty in polynomial regression
Quiz: Implementing ridge regression via gradient descent

Week 5 |?Feature Selection & Lasso

Feature Selection & Lasso

Feature selection via explicit model enumeration
Feature selection implicitly via regularized regression
Geometric intuition for sparsity of lasso solutions
Setting the stage for solving the lasso
Optimizing the lasso objective
OPTIONAL ADVANCED MATERIAL: Deriving the lasso coordinate descent update
Tying up loose ends
Programming Assignment 1
Programming Assignment 2

Quiz: Feature Selection and Lasso
Quiz: Using LASSO to select features
Quiz: Implementing LASSO using coordinate descent

Week 6 |?Nearest Neighbors & Kernel Regression

Nearest Neighbors & Kernel Regression

Motivating local fits
Nearest neighbor regression
k-Nearest neighbors and weighted k-nearest neighbors
Kernel regression
k-NN and kernel regression wrapup
Programming Assignment
What we’ve learned
Summary and what’s ahead in the specialization

Quiz: Nearest Neighbors & Kernel Regression
Quiz: Predicting house prices using k-nearest neighbors regression

Closing Remarks

Summary

Glossary

Models
Fitted lines
Regression
Linear regression
Simple linear regression
Residual sum of squares [RSS]
The least square line
Gradient descent algorithm
- Concave functions
- Convex functions
- Hill climbing
- Hill descent
- Step size
High leverage points
Influential observations
Multiple linear regression
Polynomial regression
Loss function
- Squared error
- Absolute error
Training data
Test data
Model complexity
fit ?a model to data

Sentences

The small mean of training errors doesn’t guarantee the small mean of test errors.
The smallest mean of training errors is not optimal for the mean of test errors.

adgads $\nabla RSS(w^{(t)}$ ???asdfasd
? $latex?\nabla RSS(w^{(t)} $

Study of Everything

Learning Based Life

Machine Learning: Regression | Machine Learning Specialization | Coursera

Syllabus ↑

Week 1 |?Simple Linear Regression

Welcome

Simple Linear Regression

Week 2 |?Multiple Regression

Multiple Regression

Week 3 |?Assessing Performance

Assessing Performance

Week 4 |?Ridge Regression

Ridge Regression

Week 5 |?Feature Selection & Lasso

Feature Selection & Lasso

Week 6 |?Nearest Neighbors & Kernel Regression

Nearest Neighbors & Kernel Regression

Closing Remarks

Summary

Glossary

Sentences

Leave a Reply Cancel reply

Syllabus ↑

Week 1 |?Simple Linear Regression

Welcome

Simple Linear Regression

Week 2 |?Multiple Regression

Multiple Regression

Week 3 |?Assessing Performance

Assessing Performance

Week 4 |?Ridge Regression

Ridge Regression

Week 5 |?Feature Selection & Lasso

Feature Selection & Lasso

Week 6 |?Nearest Neighbors & Kernel Regression

Nearest Neighbors & Kernel Regression

Closing Remarks

Summary

Glossary

Sentences

Related Posts

Leave a Reply Cancel reply