[TensorFlow 2.0] Variational Auto encoder (VAE) Part I

4 min readNov 7, 2019

There will be parts that are revised and elaborated for better understanding, however, I hereby acknowledge that the following post is based on TensorFlow tutorial provided in:

Convolutional Variational Autoencoder | TensorFlow Core

This notebook demonstrates how to generate images of handwritten digits by training a Variational Autoencoder ( 1, 2)…

www.tensorflow.org

For more detailed explanations and background knowledge regarding codes and dataset, you can always consult the link.

I. Introduction

Hello, everyone! How have you been! I apologize for taking too long to come back. This post will be a very remarkable step up for all of us.
We will be moving on to the ‘UN’supervised learning from our supervised learning.

Unsupervised learning v.s. supervised learning

In supervised learning, we had data (i.e., image of a daisy) and the corresponding label (i.e., daisy) and the goal was to output a function that maps data to label efficiently.

Whereas in newly facing unsupervised learning we only have a data (i.e., image of handwritten numbers). Although humans can tell the label of the image data TensorFlow does not have that information. So, what machine learning does is to learn underlying structure of the data.

Auto-encoder is one of the famous models within unsupervised learning. It is also called feature learning in a sense that it wants to learn or know the core features of the input data to generate something not too far from the input data. Generative Adversarial Network and Variational Autoencoder, abbreviated into GAN and VAE respectively, are the famous examples of unsupervised learning.

2. Concept of Generative Model

Have you heard about allegory of cave by Plato? I expect you have learned this at some point of your life. Plz watch the video and come back reading. 👇

Let’s forget about the last half of the allegory which is related to education in our society. Now, the prisoners are sooo smart that they do not need information from outside world or education. They are now able to extract core information from the shadows. In the end, if they were allowed to pick up or create the object that they saw as a shadow information, they will be able to generated the object accurately.

Coming back to TensorFlow, we can think of x, the observation variable(hand written image), as the real object at the back of prisoners and z, the latent variable, would be features of the shadows. Our objective is to generate a reconstructed image that is not too far from x, in the analogy, to create something similar to the real object. We view label data which does not appear here as an education that is unnecessary any more.

3. VAE

First step in training in each iteration of image is to create a lower dimensional representation z of the input x (to extract core information of x). Then, based on the information of the sample latent vector z, we want to reconstruct the input x’. The first step is called encoding or inference network and the second is called decoding or generative network.

During the training, our aim is to obtain the mean and variance matrix of q(z|x) in encoding and then the mean and variance matrix of q(x|z) in decoding step. However, since q(z|x) is intractable, we will be using an approximation of it.

4. Loss function and Optimizer

We will be using ELBO as a loss function. With the help of ELBO we are able to handle the approximation of q(z|x) . Also, statistically it is known that maximizing ELBO, minimizes the Kullback–Leibler divergence which measures how deviated the approximated q(z|x) is from the true q(z|x).

Evidence lower bound

Yang, Xitong. "Understanding the Variational Lower Bound" (PDF). Institute for Advanced Computer Studies. University of…

en.wikipedia.org

However, in this tutorial, for simplicity we will not be using the KL convergence. Rather, we will be doing a Monte Carlo estimate of
log p(x|z) + log p(z)-log q(z|x).

II. Technical Setup

# to generate gifs
!pip install -q imageiofrom __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf

import os
import time
import numpy as np
import glob
import matplotlib.pyplot as plt
import PIL
import imageio

from IPython import display

III. Load the MNIST Dataset

(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1).astype('float32')

train_images /= 255.
test_images /= 255.

train_images[train_images >= .5] = 1.
train_images[train_images < .5] = 0.
test_images[test_images >= .5] = 1.
test_images[test_images < .5] = 0.TRAIN_BUF = 60000
BATCH_SIZE = 100

TEST_BUF = 10000train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(TRAIN_BUF).batch(BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).shuffle(TEST_BUF).batch(BATCH_SIZE)

Up to now we have prepared our MNIST dataset ready to face VAE.
In the following posting we will go line by line to see how the VAE works.

Thank you and hope it was a helpful intro to Auto encoder.

¡Buen día 💃!