[ML Crash Course] Key Machine Learning Terminology

4 min readAug 31, 2020

In this posting, I will explain some fundamental machine learning terminology through a regression example.

The concept of regression comes from genetics and was popularized by Sir Francis Galton during the late 19th century. Galton observed that extreme characteristics (e.g., height) in parents are not passed on completely to their offspring. Rather, the characteristics in the offspring regress towards a mediocre point (a point which has since been identified as the mean). By measuring the heights of hundreds of people, he was able to quantify regression to the mean and estimate the size of the effect.

Let’s explore fundamental machine learning terminology through Galton’s dataset. It is a plot of Son’s height vs. Father’s height.

Labels

A label is a thing we’re predicting — the y variable in simple linear regression. In Galton’s dataset, Son’s height is the label. The label could be the future price of Apple stock, the kind of animal shown in a picture, the meaning of an audio clip, or just about anything.

Features

A feature is an input variable — the x variable in simple linear regression. In Galton’s dataset, Father’s height is the feature. A simple machine learning project might use a single feature, while a more sophisticated machine learning project could use millions of features.

Examples

An example is a particular instance of data, x. There are 460 examples in Galton’s dataset.

Models

A model defines the relationship between features and the label. In Galton’s dataset, for example, you could write down the relationship using the equation for a line.

where:

y is the son’s height — the value we’re trying to predict.
m is the slope of the line.
x is the father’s height — the value of our input feature.
b is the y-intercept.

By convention in machine learning, you’ll write the equation for a model slightly differently:

where:

y′ is the predicted label (the desired output).
b is the bias (the y-intercept).
w₁ is the weight of feature 1. Weight is the same concept as the “slope” in the traditional equation of a line.
x₁ is a feature (a known input).

To infer (predict) the son’s height y′ for a new father’s height x₁, just substitute the x₁ value into this model.

Although this model uses only one feature, a more sophisticated model might rely on multiple features, each having a separate weight (w₁, w₂, etc.). For example, a model that relies on three features might look as follows:

Let’s highlight two phases of a model’s life:

Training means creating or learning the model. That is, you show the model labeled examples and enable the model to gradually learn the relationships between features and the label.
Inference means applying the trained model to unlabeled examples. That is, you use the trained model to make useful predictions (y'). For example, during inference, you can predict Son’s heightfor new unlabeled examples.

Regression vs. classification

A regression model predicts continuous values. For example, regression models make predictions that answer questions like the following:

What is the value of a house in California?
What is the probability that a user will click on this ad?

A classification model predicts discrete values. For example, classification models make predictions that answer questions like the following:

Is a given email message spam or not spam?
Is this an image of a dog, a cat, or a hamster?

Portions of this page are modifications based on work created and shared by Google and used according to terms described in the Creative Commons 4.0 Attribution License.

Framing: Key ML Terminology | Machine Learning Crash Course

Estimated Time: 8 minutes What is (supervised) machine learning? Concisely put, it is the following: ML systems learn…

developers.google.com

Descending into ML: Linear Regression | Machine Learning Crash Course

Estimated Time: 6 minutes It has long been known that crickets (an insect species) chirp more frequently on hotter days…