An overview of topics and how they fit together.
Teaching a machine how to learn from and make predctions on data.
or
Different approaches for different types of problem
There's a particular piece of information - the outcome - you want to predict about each piece of data, and you have some data already labeled with this outcome that you can train on.
The outcome is also sometimes referred to as the dependent variable. The predictors are referred to as independent variables.
or
What type of question needs to be asked of your data?
Question: Which class does this example belong to?
Dependent variable is qualitative / categorical
Standard example: classifying emails as spam or not spam
Standard example: identifying hand-written digits
Each example could fall into multiple categories. E.g. Yelp restaurants categorized as "Mexican food" and "Good for lunch"
Note: Deep Learning is not just big neural nets
or
What type of question needs to be asked of your data?
Question: How much y does this example have?
Dependent variable is quantitative / continuous
(source: ISLR)
Given assumptions about the relationship between your predictors and the outcome, estimate the parameters of that relationship. In a simple linear relationship:
y = a + bx (+ e)
Fitting a model means finding what a and b should be. We fit them to the data we already have.
Training is the process by which we teach the model to fit the data. In simple linear regression we do this by getting it to minimize the Mean Squared Error (MSE), which is the average difference between real values and predicted values.
You have a loss function, aka a cost function, which computes the error of your model, and it is a function of the parameters of the model. You need a procedure for minimizing that function with respect to the parameters.
Your prediction algorithm feeds new data into this model to predict the outcome.
Suppose you fit a simple linear model:
y = a + bx
and get a value of 5 for your a parameter and 2 for your b parameter.
If you feed an x value of 3 into this you'd get:
y = 5 + (2 x 3) = 11
or
Different approaches for different types of problem
Unlabeled data, i.e. no outcome variable, just a bunch of data that you are trying to find some hidden structure in.
(source: http://sherrytowers.com/2013/10/24/k-means-clustering/)
(source: http://austingwalters.com/pca-principal-component-analysis/)
Specialized techniques for working with textual data.
Also known as model-based machine learning. Separates model from inference method and uses probabilistic programming to refine models.
Overfitting is when your model fits your training data really well but doesn't generalize well to new data.
"Shrink" your parameters by adding a penalty. In neural nets this is called "weight decay."
This helps avoid overfitting. If you are underfitting, your model is not sophisticated enough, consider adding more features.
Don't just split your data into a test set and a training set. The temptation will be too great to tweak the model based on its performance on the test set.
Use a validation set for this, otherwise your test set error is not a good indicator of what the error will be on unseen data.
from Michael Littman
Space | Forward |
---|---|
Right, Down, Page Down | Next slide |
Left, Up, Page Up | Previous slide |
P | Open presenter console |
H | Toggle this help |