[TensorFlow 2.0] Variational Auto encoder (VAE) Part II

A Ydobon
5 min readNov 17, 2019

--

Let’s get started!

1. Load the dependent libraries

In the previous posting, we have finished two things, first, loading the dependent libraries to our workspace,

from __future__ import absolute_import, division, print_function, unicode_literalsimport tensorflow as tfimport os
import time
import numpy as np
import glob
import matplotlib.pyplot as plt
import PIL
import imageio
from IPython import display

2. Load the MNIST dataset

second, loading the MNIST dataset that we will work on.

(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()

As you can see from the above code lines, we have not loaded the labels from the MNIST dataset. Unlike the other times, we have only got the input image vectors, leaving the labels behind.

3. Preprocess the dataset

Next, we will start to preprocess the MNIST dataset. The preprocessing steps are the following:

Step 1.

Reshape the 784 one dimensional vector into a 28x28 square shape vector. And we will set each value’s data type as ‘float’ by specifying “astype(‘flaot32’)” at the end of each code.

train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1).astype('float32')

Step 2.

We will normalize each value to fit in the span of [0., 1.] by dividing the value with 255. (Each value is in between of 0 to 255, signifying the intensity of the pixel).

train_images /= 255.
test_images /= 255.

Step 3.

We will binarize the dataset by assigning “1.” to the value which is greater or equal to “0.5”, but assigning “0.” to the values lower than “0.5”. Basically “0.5” is our threshold to binarize our input dataset. However, if you would like, you can change this threshold value and check the performance later on.

train_images[train_images >= .5] = 1.
train_images[train_images < .5] = 0.
test_images[test_images >= .5] = 1.
test_images[test_images < .5] = 0.

And lastly, we will set up the necessary parameters to create our dataset before we jump into tf.data part.

TRAIN_BUF = 60000
BATCH_SIZE = 100
TEST_BUF = 10000

4. Create batches and shuffle the dataset using “tf.data”

In machine learning, it is far more efficient to train our model in batches, in other words, training the model with a group of the dataset, rather than training with one unit of dataset each time is preferred.

train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(TRAIN_BUF).batch(BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).shuffle(TEST_BUF).batch(BATCH_SIZE)

So, we shuffle the data with the number of “TRAIN_BUF” and “TEST_BUF” for each training and test dataset. After the shuffle, with the shuffled data, we group them with the size of “BATCH_SIZE”.

5. Build our Convolutional Variational Autoencoder model, wiring up the generative and inference network

We will create a class containing every essential component for the autoencoder: Inference network, Generative network, and Sampling, Encoding, Decoding functions, and lastly Reparameterizing function.

class CVAE(tf.keras.Model):
def __init__(self, latent_dim):
super(CVAE, self).__init__()
self.latent_dim = latent_dim
self.inference_net = tf.keras.Sequential(
[
tf.keras.layers.InputLayer(input_shape=(28, 28, 1)),
tf.keras.layers.Conv2D(
filters=32, kernel_size=3, strides=(2, 2), activation='relu'),
tf.keras.layers.Conv2D(
filters=64, kernel_size=3, strides=(2, 2), activation='relu'),
tf.keras.layers.Flatten(),
# No activation
tf.keras.layers.Dense(latent_dim + latent_dim),
]
)
self.generative_net = tf.keras.Sequential(
[
tf.keras.layers.InputLayer(input_shape=(latent_dim,)),
tf.keras.layers.Dense(units=7*7*32, activation=tf.nn.relu),
tf.keras.layers.Reshape(target_shape=(7, 7, 32)),
tf.keras.layers.Conv2DTranspose(
filters=64,
kernel_size=3,
strides=(2, 2),
padding="SAME",
activation='relu'),
tf.keras.layers.Conv2DTranspose(
filters=32,
kernel_size=3,
strides=(2, 2),
padding="SAME",
activation='relu'),
# No activation
tf.keras.layers.Conv2DTranspose(
filters=1, kernel_size=3, strides=(1, 1), padding="SAME"),
]
)
@tf.function
def sample(self, eps=None):
if eps is None:
eps = tf.random.normal(shape=(100, self.latent_dim))
return self.decode(eps, apply_sigmoid=True)
def encode(self, x):
mean, logvar = tf.split(self.inference_net(x), num_or_size_splits=2, axis=1)
return mean, logvar
def reparameterize(self, mean, logvar):
eps = tf.random.normal(shape=mean.shape)
return eps * tf.exp(logvar * .5) + mean
def decode(self, z, apply_sigmoid=False):
logits = self.generative_net(z)
if apply_sigmoid:
probs = tf.sigmoid(logits)
return probs
return logits

First, let’s see the “def __init__()” part. This part is to create necessary variables within the class. We can observe that every variable declaration starts with “self.”. This is the rule that we should accept. We can see that we will need and use ‘latent_dim’ value, “inference_net” and “generative_net”.

Inference network

Let’s start with the inference network block, which starts with “self.inference_net”. First, we take the normalized input image vector with “InputLayer”, specifying its shape with “(28, 28, 1)”. After that 3x3 size kernel (“kernel_size = 3”) convolves on the input image vector, having the stride-step 2 (“strides = (2,2)”), and output is 32-filtered vector (“filters = 32”).

Likewise, the 32 filtered ones again go to the convolutional layer with “Conv2D” and output 64 filtered vectors.

Lastly, we make the output vector to be flattened as a single-dimensional vector with the method function “Flatten()”.

Generative network

Next, the generative network. In easy terms, this network work in the opposite way to the inference network. We can spot some contrasting codes in this block, such as “Conv2DTranspose” as opposed to the normal “Conv2D” layer. It is like de-convolutioning the input so that the generative network and the inference network become each other’s mirror image in the whole neural networks’ architecture design.

This generative network’s input data is a latent encoding vector that comes from a unit Gaussian distribution p(z). The output of the network is a conditional distribution p(x|z). This input vector is produced by the “sample” function below.

As we have set before, 100 signifies the number of the batch size, and the “self.latent_dim” would be the number of columns for our matrix input.

We have made major components for our convolutional autoencoder, and what we left is to put each of them in the right place to train and learn the parameters.

The next part III will be about making and setting up a loss function and structuring the codes to train the model with the preprocessed dataset.

:) !!

--

--

A Ydobon
A Ydobon

Responses (1)