GAN

yueyuan
6 min readOct 27, 2020

(1) Discriminative models: classifiers

Features -> Class, P(Y|X)

Try to model the probability of class Y , given a set of features X, predict category Y

For example, we have cats and dogs.

Features: wet nose/purrs/etc ->[dog, cat] [0.9, 0.1]

(2) Generative model:

Noise (3, -5, 2.6, or vector of above values), Class -> Features

epslon, Y -> X (wet nose/tongue out)

The noise here is a random set of values going into model to ensure that what’s generated isn’t actually the same dog each time.

Try to make realistic representation of some class

They take some random input represented by the noise here

P(X|Y)

VAE vs GAN

VAE: variational autoencoders: encoder (input image, output vector), decoder (input vector, output image)

GAN: generative adversarial networks: generator (input vector, output image, similar to decoder), discriminator (simultaneously decide if it is fake or real). The model is growing each time until they compete with and learn from each other. Until they reach a point we don’t need this second model any more, the generator can take any random noise and produce realistic image.

For example, producing human faces, cats etc, mimics the distribution of the training data, from one domain to another, horse to zebra, from image to animation, 3d objects, generating artificial medical data, next-gen photoshop, text generation, data augmentation etc. Snapchat and tiktok create new image filters. Disney use it for superresolution.

If you don’t have enough data of a certain class or of a certain type of image, you can generate the additional training data for classification.

Intuition & Competitions

GANs are powerful models that learn to produce realistic objects that are difficult to distinguish from existing real ones, for instance, human faces.

The generator and discriminator competes with each other.

The generator learns to generate fakes that look real, to fool the discriminator. (paint forger)

The discriminator learns to distinguish between what’s real and what’s fake. (art inspector)

To start this game, all you need is a collection of these real images, like some famous paintings. If you want the generator to paint famous paintings, at the beginning the generator actually isn’t very sophisticated. It doesn’t know how to produce real looking artwork. Additionally, the generator isn’t allowed to see the real images. It doesn’t know how the real paint should look. The discriminator also doesn’t know how to decide what’s real or fake. After it decides maybe this looks real, you actually tell it yes or no, real or fake. You do tell the discriminator what’s right or wrong in determining those two classes. The generator produce a batch of paintings, it will know in what direction to go on and improve by looking at the scores assigned to her work by the discriminator. After several rounds, the generator will produce images harder and harder to distinguish, until the image produced by the generator is able to fool the discriminator.

Discriminator

It is a type of classifier, determining how fake the image is, model the probability of an example being fake given a set of input features X. The probabilities are feedback for generator.

X(features), Y(labels) → Discriminator (parameters theta) → Y^ (output)

Cost function: compute how closely Y^ is to Y.

From cost function, you can update the parameters (attribute of node), the nodes (structure) in that neural network according to the gradient of this cost function.

It models the probability of class Y given input features X.

Generator

Generates examples of the class. Each time takes as input different sets of random values (noise vector). And it produces fake data.

Noise-> Generator-> Features: X^ -> Discriminator -> Output: Yd -> Cost: output Y^-> use cost to update parameters

Once you get a generator that looks good to discriminator, you can save the parameters theta of the generator somewhere. You can load it up and then sample from this safe generator.

P(X|Y): It learns probabilities of features X given Y, Model the features x conditioned on class y: P( x | y ).

BCE Cost Function

Binary Cross Entropy

m: number of samples, take the average cost of the samples

h: predictions made by the model

y: label of class

x: features pass through predictions

theta: parameters

The BCE cost function has two parts (one relevant for each class)

Close to zero when the label and the prediction are similar

Approaches infinity when the label and the prediction are different

How does the discriminator learn over time? Getting feedback on if its classification was correct.

How does the generator learn over time? Using feedback from the discriminator.

The discriminator looks at real and fake images over time, makes guesses, and gets feedback on whether its guess was right or wrong.

Over time, it learns to discern real from fake better, but note that since the generator is also learning, the fake images get more realistic and harder to discern. This cat and mouse game enables both models to learn in tandem.

If one model significantly better than the other, it doesn’t help the other learn because the feedback isn’t useful. Imagine if you were a beginning artist, and you showed your work to an art expert, asking whether your painting looked like a famous piece and all they said was ‘no’. Because they have a very discerning eye, they know your image is not right, but won’t be able to tell you how close you are.

GANs try to make the real and generated distributions look similar. When the discriminator improves too much, the function approximated by BCE loss will contain flat regions.

正常的 discriminator 比 generator 学得快,而且一开始 generator 生成的 fake 数据很明显,所以 discriminator 很容易很快就学到一个很精确的 classifier 去 classify real and fake image。这样就没办法给一个 informative 的feedback 给 generator。然后,generator 就没办法学东西。

Flat regions on the cost function = vanishing gradients

BCE Loss problem:

The discriminator does not output useful gradients (feedback) for the generator when the real/fake distributions are far apart. This is also called the vanishing gradient problem because the gradients approach 0 when the distributions are far apart.

Earth Mover’s Distance

不用 sigmoid transformation,朝左偏很快变成0,朝右偏很快变成1,所以 discriminator 一开始就学到一个 classifier,它的输出完全是0或者1,就没有什么有效的信息。因为它很快就把fake照片找出来。其实,我们需要discriminator能输出更多介于0和1之间的数字给fake 照片。这样generator才能学到哪些fake照片比较像真的。

Earth mover’s distance is a measure of how different two distributions are by estimating the effort it takes to make the generated distribution equal to the real one.

Wasserstein Loss

BCE loss measure how bad, on average, some observations are being classified by the discriminator as fake or real.

--

--