[ML Crash Course] Key Machine Learning Terminology

A Ydobon
4 min readAug 31, 2020

--

In this posting, I will explain some fundamental machine learning terminology through a regression example.

Galton’s experimental setup

The concept of regression comes from genetics and was popularized by Sir Francis Galton during the late 19th century. Galton observed that extreme characteristics (e.g., height) in parents are not passed on completely to their offspring. Rather, the characteristics in the offspring regress towards a mediocre point (a point which has since been identified as the mean). By measuring the heights of hundreds of people, he was able to quantify regression to the mean and estimate the size of the effect.

Galton’s dataset
Linear regression

Let’s explore fundamental machine learning terminology through Galton’s dataset. It is a plot of Son’s height vs. Father’s height.

Labels

A label is a thing we’re predicting — the y variable in simple linear regression. In Galton’s dataset, Son’s height is the label. The label could be the future price of Apple stock, the kind of animal shown in a picture, the meaning of an audio clip, or just about anything.

Features

A feature is an input variable — the x variable in simple linear regression. In Galton’s dataset, Father’s height is the feature. A simple machine learning project might use a single feature, while a more sophisticated machine learning project could use millions of features.

Examples

An example is a particular instance of data, x. There are 460 examples in Galton’s dataset.

Models

A model defines the relationship between features and the label. In Galton’s dataset, for example, you could write down the relationship using the equation for a line.

where:

  • y is the son’s height — the value we’re trying to predict.
  • m is the slope of the line.
  • x is the father’s height — the value of our input feature.
  • b is the y-intercept.

By convention in machine learning, you’ll write the equation for a model slightly differently:

where:

  • y′ is the predicted label (the desired output).
  • b is the bias (the y-intercept).
  • w₁ is the weight of feature 1. Weight is the same concept as the “slope” in the traditional equation of a line.
  • x₁ is a feature (a known input).

To infer (predict) the son’s height y′ for a new father’s height x₁, just substitute the x₁ value into this model.

Although this model uses only one feature, a more sophisticated model might rely on multiple features, each having a separate weight (w₁, w₂, etc.). For example, a model that relies on three features might look as follows:

Let’s highlight two phases of a model’s life:

  • Training means creating or learning the model. That is, you show the model labeled examples and enable the model to gradually learn the relationships between features and the label.
  • Inference means applying the trained model to unlabeled examples. That is, you use the trained model to make useful predictions (y'). For example, during inference, you can predict Son’s heightfor new unlabeled examples.

Regression vs. classification

A regression model predicts continuous values. For example, regression models make predictions that answer questions like the following:

  • What is the value of a house in California?
  • What is the probability that a user will click on this ad?

A classification model predicts discrete values. For example, classification models make predictions that answer questions like the following:

  • Is a given email message spam or not spam?
  • Is this an image of a dog, a cat, or a hamster?

--

--

A Ydobon
A Ydobon

No responses yet