[TensorFlow 2.0] Variational Auto encoder (VAE) Part II

A Ydobon

5 min readNov 17, 2019

Convolutional Variational Autoencoder | TensorFlow Core

This notebook demonstrates how to generate images of handwritten digits by training a Variational Autoencoder ( 1, 2)…

www.tensorflow.org

Let’s get started!

1. Load the dependent libraries

In the previous posting, we have finished two things, first, loading the dependent libraries to our workspace,

from __future__ import absolute_import, division, print_function, unicode_literalsimport tensorflow as tfimport os
import time
import numpy as np
import glob
import matplotlib.pyplot as plt
import PIL
import imageiofrom IPython import display

2. Load the MNIST dataset

second, loading the MNIST dataset that we will work on.

(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()

As you can see from the above code lines, we have not loaded the labels from the MNIST dataset. Unlike the other times, we have only got the input image vectors, leaving the labels behind.

3. Preprocess the dataset

Next, we will start to preprocess the MNIST dataset. The preprocessing steps are the following:

Step 1.

Reshape the 784 one dimensional vector into a 28x28 square shape vector. And we will set each value’s data type as ‘float’ by specifying “astype(‘flaot32’)” at the end of each code.

train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1).astype('float32')

Step 2.

We will normalize each value to fit in the span of [0., 1.] by dividing the value with 255. (Each value is in between of 0 to 255, signifying the intensity of the pixel).

train_images /= 255.
test_images /= 255.

Step 3.

We will binarize the dataset by assigning “1.” to the value which is greater or equal to “0.5”, but assigning “0.” to the values lower than “0.5”. Basically “0.5” is our threshold to binarize our input dataset. However, if you would like, you can change this threshold value and check the performance later on.

train_images[train_images >= .5] = 1.
train_images[train_images < .5] = 0.
test_images[test_images >= .5] = 1.
test_images[test_images < .5] = 0.

And lastly, we will set up the necessary parameters to create our dataset before we jump into tf.data part.

TRAIN_BUF = 60000
BATCH_SIZE = 100TEST_BUF = 10000

4. Create batches and shuffle the dataset using “tf.data”

In machine learning, it is far more efficient to train our model in batches, in other words, training the model with a group of the dataset, rather than training with one unit of dataset each time is preferred.

train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(TRAIN_BUF).batch(BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).shuffle(TEST_BUF).batch(BATCH_SIZE)

So, we shuffle the data with the number of “TRAIN_BUF” and “TEST_BUF” for each training and test dataset. After the shuffle, with the shuffled data, we group them with the size of “BATCH_SIZE”.

5. Build our Convolutional Variational Autoencoder model, wiring up the generative and inference network

We will create a class containing every essential component for the autoencoder: Inference network, Generative network, and Sampling, Encoding, Decoding functions, and lastly Reparameterizing function.

class CVAE(tf.keras.Model):
  def __init__(self, latent_dim):
    super(CVAE, self).__init__()
    self.latent_dim = latent_dim
    self.inference_net = tf.keras.Sequential(
      [
          tf.keras.layers.InputLayer(input_shape=(28, 28, 1)),
          tf.keras.layers.Conv2D(
              filters=32, kernel_size=3, strides=(2, 2), activation='relu'),
          tf.keras.layers.Conv2D(
              filters=64, kernel_size=3, strides=(2, 2), activation='relu'),
          tf.keras.layers.Flatten(),
          # No activation
          tf.keras.layers.Dense(latent_dim + latent_dim),
      ]
    )self.generative_net = tf.keras.Sequential(
        [
          tf.keras.layers.InputLayer(input_shape=(latent_dim,)),
          tf.keras.layers.Dense(units=7*7*32, activation=tf.nn.relu),
          tf.keras.layers.Reshape(target_shape=(7, 7, 32)),
          tf.keras.layers.Conv2DTranspose(
              filters=64,
              kernel_size=3,
              strides=(2, 2),
              padding="SAME",
              activation='relu'),
          tf.keras.layers.Conv2DTranspose(
              filters=32,
              kernel_size=3,
              strides=(2, 2),
              padding="SAME",
              activation='relu'),
          # No activation
          tf.keras.layers.Conv2DTranspose(
              filters=1, kernel_size=3, strides=(1, 1), padding="SAME"),
        ]
    )@tf.function
  def sample(self, eps=None):
    if eps is None:
      eps = tf.random.normal(shape=(100, self.latent_dim))
    return self.decode(eps, apply_sigmoid=True)def encode(self, x):
    mean, logvar = tf.split(self.inference_net(x), num_or_size_splits=2, axis=1)
    return mean, logvardef reparameterize(self, mean, logvar):
    eps = tf.random.normal(shape=mean.shape)
    return eps * tf.exp(logvar * .5) + meandef decode(self, z, apply_sigmoid=False):
    logits = self.generative_net(z)
    if apply_sigmoid:
      probs = tf.sigmoid(logits)
      return probsreturn logits

First, let’s see the “def __init__()” part. This part is to create necessary variables within the class. We can observe that every variable declaration starts with “self.”. This is the rule that we should accept. We can see that we will need and use ‘latent_dim’ value, “inference_net” and “generative_net”.

Let’s start with the inference network block, which starts with “self.inference_net”. First, we take the normalized input image vector with “InputLayer”, specifying its shape with “(28, 28, 1)”. After that 3x3 size kernel (“kernel_size = 3”) convolves on the input image vector, having the stride-step 2 (“strides = (2,2)”), and output is 32-filtered vector (“filters = 32”).

Likewise, the 32 filtered ones again go to the convolutional layer with “Conv2D” and output 64 filtered vectors.

Lastly, we make the output vector to be flattened as a single-dimensional vector with the method function “Flatten()”.

Next, the generative network. In easy terms, this network work in the opposite way to the inference network. We can spot some contrasting codes in this block, such as “Conv2DTranspose” as opposed to the normal “Conv2D” layer. It is like de-convolutioning the input so that the generative network and the inference network become each other’s mirror image in the whole neural networks’ architecture design.

This generative network’s input data is a latent encoding vector that comes from a unit Gaussian distribution p(z). The output of the network is a conditional distribution p(x|z). This input vector is produced by the “sample” function below.

As we have set before, 100 signifies the number of the batch size, and the “self.latent_dim” would be the number of columns for our matrix input.

We have made major components for our convolutional autoencoder, and what we left is to put each of them in the right place to train and learn the parameters.

The next part III will be about making and setting up a loss function and structuring the codes to train the model with the preprocessed dataset.

:) !!