There will be parts that are revised and elaborated for better understanding, however, I hereby acknowledge that the following post is based on TensorFlow tutorial provided in:
For more detailed explanations and background knowledge regarding codes and dataset, you can always consult the link.
I. Introduction
Hello, everyone! How have you been! I apologize for taking too long to come back. This post will be a very remarkable step up for all of us.
We will be moving on to the ‘UN’supervised learning from our supervised learning.
- Unsupervised learning v.s. supervised learning
In supervised learning, we had data (i.e., image of a daisy) and the corresponding label (i.e., daisy) and the goal was to output a function that maps data to label efficiently.
Whereas in newly facing unsupervised learning we only have a data (i.e., image of handwritten numbers). Although humans can tell the label of the image data TensorFlow does not have that information. So, what machine learning does is to learn underlying structure of the data.
Auto-encoder is one of the famous models within unsupervised learning. It is also called feature learning in a sense that it wants to learn or know the core features of the input data to generate something not too far from the input data. Generative Adversarial Network and Variational Autoencoder, abbreviated into GAN and VAE respectively, are the famous examples of unsupervised learning.
2. Concept of Generative Model
Have you heard about allegory of cave by Plato? I expect you have learned this at some point of your life. Plz watch the video and come back reading. 👇
Let’s forget about the last half of the allegory which is related to education in our society. Now, the prisoners are sooo smart that they do not need information from outside world or education. They are now able to extract core information from the shadows. In the end, if they were allowed to pick up or create the object that they saw as a shadow information, they will be able to generated the object accurately.
Coming back to TensorFlow, we can think of x, the observation variable(hand written image), as the real object at the back of prisoners and z, the latent variable, would be features of the shadows. Our objective is to generate a reconstructed image that is not too far from x, in the analogy, to create something similar to the real object. We view label data which does not appear here as an education that is unnecessary any more.
3. VAE
First step in training in each iteration of image is to create a lower dimensional representation z of the input x (to extract core information of x). Then, based on the information of the sample latent vector z, we want to reconstruct the input x’. The first step is called encoding or inference network and the second is called decoding or generative network.
During the training, our aim is to obtain the mean and variance matrix of q(z|x) in encoding and then the mean and variance matrix of q(x|z) in decoding step. However, since q(z|x) is intractable, we will be using an approximation of it.
4. Loss function and Optimizer
We will be using ELBO as a loss function. With the help of ELBO we are able to handle the approximation of q(z|x) . Also, statistically it is known that maximizing ELBO, minimizes the Kullback–Leibler divergence which measures how deviated the approximated q(z|x) is from the true q(z|x).
However, in this tutorial, for simplicity we will not be using the KL convergence. Rather, we will be doing a Monte Carlo estimate of
log p(x|z) + log p(z)-log q(z|x).
II. Technical Setup
# to generate gifs
!pip install -q imageiofrom __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import os
import time
import numpy as np
import glob
import matplotlib.pyplot as plt
import PIL
import imageio
from IPython import display
III. Load the MNIST Dataset
(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1).astype('float32')
train_images /= 255.
test_images /= 255.
train_images[train_images >= .5] = 1.
train_images[train_images < .5] = 0.
test_images[test_images >= .5] = 1.
test_images[test_images < .5] = 0.TRAIN_BUF = 60000
BATCH_SIZE = 100
TEST_BUF = 10000train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(TRAIN_BUF).batch(BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).shuffle(TEST_BUF).batch(BATCH_SIZE)
Up to now we have prepared our MNIST dataset ready to face VAE.
In the following posting we will go line by line to see how the VAE works.
Thank you and hope it was a helpful intro to Auto encoder.
¡Buen día 💃!