11 Essential Neural Network Architectures, Visualized & Explained

posted Jul 4, 2020, 12:21 PM by Chris G   [ updated Jul 4, 2020, 12:22 PM ]
From: https://towardsdatascience.com/11-essential-neural-network-architectures-visualized-explained-7fc7da3486d8

Source: Pixabay

Standard, Recurrent, Convolutional, & Autoencoder Networks

With the rapid development of deep learning, an entire host of neural network architectures have been created to address a wide variety of tasks and problems. Although there are countless neural network architectures, here are eleven that are essential for any deep learning engineer to understand, split into four general categories: the standard networks, the recurrent networks, the convolutional networks, and the autoencoders.

The Standard Networks

1 | The Perceptron

The perceptron is the most basic of all neural networks, being a fundamental building block of more complex neural networks. It simply connects an input cell and an output cell.

2 | The Feed-Forward Network

The feed-forward network is a collection of perceptrons, in which there are three fundamental types of layers — input layers, hidden layers, and output layers. During each connection, the signal from the previous layer is multiplied by a weight, added to a bias, and passed through an activation function. Feed-forward networks use backpropagation to iteratively update the parameters until it achieves a desirable performance.

3 | Residual Networks (ResNet)

One issue with deep feed-forward neural networks is called the vanishing gradient problem, which is when networks are too long for useful information to be backpropagated throughout the network. As the signal that updates the parameters travels through the network, it gradually diminishes until weights at the front of the network are not changed or utilized at all.

To address this problem, a Residual Network employs skip connections, which propagate signals across a ‘jumped’ layer. This reduces the vanishing gradient problem by employing connections that are less vulnerable to it. Over time, the network learns to restore skipped layers as it learns the feature space, but is more efficient in training as it is less vulnerable to vanishing gradients and needs to explore less of the feature space.

The Recurrent Networks

4 | The Recurrent Neural Network (RNN)

A recurrent neural network is a specialized type of network that contains loops, and recurs over itself, hence the name “recurrent”. Allowing for information to be stored in the network, RNNs use reasoning from previous training to make better, more informed decisions about upcoming events. In order to do this, it uses the previous predictions as ‘context signals’. Because of its nature, RNNs are commonly used to handle sequential tasks, such as generating text letter-by-letter or predicting time series data (for example, stock prices). They can also handle inputs of any size.

Two RNN visualization methods.

5 | The Long Short Term Memory Network (LSTM)

RNNs are problematic in that the range of contextual information is, in practice, very limited. The influence (backpropagated error) of a given input on an input on the hidden layer (and hence on the network’s output) either blows up exponentially or decays into nothing as it is cycled around the network’s connections. The solution to this vanishing gradient problem is a Long Short-Term Memory Network, or an LSTM.

This RNN architecture is specifically designed to address the vanishing gradient problem, fitting the structure with memory blocks. These blocks can be thought of as memory chips in a computer — each one contains several recurrently connected memory cells and three gates (input, output, and forget, equivalents of write, read, and reset). The network can only interact with cells through each gate, and hence the gates learn to open and close intelligently to prevent exploding or vanishing gradients but also propagate useful information through “constant error carousels”, as well as discarding irrelevant memory content.

Where standard RNNs fail to learn the presence of time lags larger than five to ten time steps between input events and target signals, LSTM is not affected and can learn to connect time lags even 1,000 time steps by enforcing a useful constant error flow.

6 | Echo State Networks (ESN)

An echo state network is a variant of a recurrent neural network with a very sparsely connected hidden layer (typically, a one-percent connectivity). The connectivity and weights of neurons are randomly assigned, and ignore layer and neuron discrepancies (skip connections). The weights of output neurons are learned such that the network can produce and reproduce specific temporal patterns. The reasoning behind this network comes from the fact that although it is nonlinear, the only weights modified during training are the synapse connections, and hence the error function can be differentiated into a linear system.

The Convolutional Networks

7 | The Convolutional Neural Network (CNN)

Images have a very high dimensionality, and hence training a standard feed-forward network to recognize images would require hundreds of thousands of input neurons, which, besides a blatantly high computational bill, can cause many problems associated with the Curse of Dimensionality in neural networks. The Convolutional Neural Network (CNN) provides a solution to this by utilizing convolutional and pooling layers to help reduce the dimensionality of an image. As convolutional layers are trainable but have significantly less parameters than a standard hidden layer, it is able to highlight important parts of the image and pass each of them forward. Traditionally in CNNs, the last few layers are hidden layers, which process the ‘condensed image information’.

Convolutional Neural Networks perform well on image-based tasks, such as classifying an image as a dog or a cat.

8 | The Deconvolutional Neural Network (DNN)

Deconvolutional Neural Networks, as its name suggests, performs the opposite of a convolutional neural network. Instead of performing convolutions to reduce the dimensionality of an image, a DNN utilizes deconvolutions to create an image, usually from noise. This is an inherently difficult task; consider a CNN’s task to write a three-sentence summary of the complete book of Orwell’s 1984 while a DNN’s task is to write the complete book from a three-sentence structure.

9 | Generative Adversarial Network (GAN)

A Generative Adversarial Network is a specialized type of network designed specifically to generate images, and is composed of two networks — a discriminator and a generator. The discriminator’s task is to discriminate between whether an image is pulled from the dataset or if it has been generated by the generator, and the generator’s task is to generate images convincing enough such that the discriminator cannot distinguish whether it is real or not.

Over time, with careful regulation, these two adversaries compete with each other, each’s drive to succeed improving the other. The end result is a well-trained generator that can spit out a realistic-looking image. The discriminator is a convolutional neural network whose goal is to maximize its accuracy in identifying real/fake images, whereas the generator is a deconvolutional neural network whose goal is to minimize the discriminator’s performance.

Generator diagram.

You can read more about generative adversarial networks here:

The Auto-Encoders

10 | The Auto Encoder (AE)

The fundamental idea of an autoencoder is to take in original, high-dimensionality data, ‘compress it’ into a highly informational and low-dimensional data, and then to project the compressed form into a new space. There are many applications of autoencoders, including dimensionality reduction, image compression, denoising data, feature extraction, image generation, and recommendation systems. It can be purposed as both an unsupervised or a supervised method, can be very insightful as to the nature of the data.

Hidden cells can be replaced with convolutional layers to accommodate processing images.

11 | The Variational Auto Encoder (VAE)

Whereas an autoencoder learns a compressed representation of an input, which could be images or text sequences, for example, by compressing the input and then decompressing it back to match the original input, a variational autoencoder (VAE) learns the parameters of a probability distribution representing the data. Instead of just learning a function representing the data, it gains a more detailed and nuanced view of the data, sampling from the distribution and generating new input data samples. In this sense, it is more of a purely ‘generative’ model, like a GAN.

A VAE uses a probabilistic hidden cell, which applies a radial basis function to the difference between the test case and the cell’s mean.