Artificial Intelligence

Architect vs Engineer

posted Sep 20, 2020, 2:57 PM by Chris G   [ updated Sep 20, 2020, 2:58 PM ]

The Cool Way to Search Text

posted Aug 9, 2020, 10:21 AM by Chris G   [ updated Aug 9, 2020, 10:21 AM ]

Google announced a significant upgrade to their search algorithm in late 2019. The technique is super cool, inspired by the human brain, redefining how we search for text, images, music, and more.

Here’s one example from Google’s post. Let’s say we’re wondering whether we can pick up medicine for a friend, who may be sick and cannot enter a pharmacy due to Covid19 restrictions. We Google it:


Before this upgrade, Google would show us information about prescriptions on the left. Old Google would see high overlap between our keywords “get, pharmacy, medicine” and the top blog post. Statistically, these keywords are the meaty, distinguishing topic of our inquiry. The rest of the words are quite common, such as “can, you,” and “for”. Articles that only include those wouldn’t do us much good. Google would seek keyword overlap, finding the most authoritative and relevant answer. This is how search has worked for decades, which we call syntactic search, based on the written syntax of human language.

11 Essential Neural Network Architectures, Visualized & Explained

posted Jul 4, 2020, 12:21 PM by Chris G   [ updated Jul 4, 2020, 12:22 PM ]


Source: Pixabay

Standard, Recurrent, Convolutional, & Autoencoder Networks

With the rapid development of deep learning, an entire host of neural network architectures have been created to address a wide variety of tasks and problems. Although there are countless neural network architectures, here are eleven that are essential for any deep learning engineer to understand, split into four general categories: the standard networks, the recurrent networks, the convolutional networks, and the autoencoders.

The Standard Networks

1 | The Perceptron

The perceptron is the most basic of all neural networks, being a fundamental building block of more complex neural networks. It simply connects an input cell and an output cell.

2 | The Feed-Forward Network

The feed-forward network is a collection of perceptrons, in which there are three fundamental types of layers — input layers, hidden layers, and output layers. During each connection, the signal from the previous layer is multiplied by a weight, added to a bias, and passed through an activation function. Feed-forward networks use backpropagation to iteratively update the parameters until it achieves a desirable performance.

3 | Residual Networks (ResNet)

One issue with deep feed-forward neural networks is called the vanishing gradient problem, which is when networks are too long for useful information to be backpropagated throughout the network. As the signal that updates the parameters travels through the network, it gradually diminishes until weights at the front of the network are not changed or utilized at all.

To address this problem, a Residual Network employs skip connections, which propagate signals across a ‘jumped’ layer. This reduces the vanishing gradient problem by employing connections that are less vulnerable to it. Over time, the network learns to restore skipped layers as it learns the feature space, but is more efficient in training as it is less vulnerable to vanishing gradients and needs to explore less of the feature space.

The Recurrent Networks

4 | The Recurrent Neural Network (RNN)

A recurrent neural network is a specialized type of network that contains loops, and recurs over itself, hence the name “recurrent”. Allowing for information to be stored in the network, RNNs use reasoning from previous training to make better, more informed decisions about upcoming events. In order to do this, it uses the previous predictions as ‘context signals’. Because of its nature, RNNs are commonly used to handle sequential tasks, such as generating text letter-by-letter or predicting time series data (for example, stock prices). They can also handle inputs of any size.

Two RNN visualization methods.

5 | The Long Short Term Memory Network (LSTM)

RNNs are problematic in that the range of contextual information is, in practice, very limited. The influence (backpropagated error) of a given input on an input on the hidden layer (and hence on the network’s output) either blows up exponentially or decays into nothing as it is cycled around the network’s connections. The solution to this vanishing gradient problem is a Long Short-Term Memory Network, or an LSTM.

This RNN architecture is specifically designed to address the vanishing gradient problem, fitting the structure with memory blocks. These blocks can be thought of as memory chips in a computer — each one contains several recurrently connected memory cells and three gates (input, output, and forget, equivalents of write, read, and reset). The network can only interact with cells through each gate, and hence the gates learn to open and close intelligently to prevent exploding or vanishing gradients but also propagate useful information through “constant error carousels”, as well as discarding irrelevant memory content.

Where standard RNNs fail to learn the presence of time lags larger than five to ten time steps between input events and target signals, LSTM is not affected and can learn to connect time lags even 1,000 time steps by enforcing a useful constant error flow.

6 | Echo State Networks (ESN)

An echo state network is a variant of a recurrent neural network with a very sparsely connected hidden layer (typically, a one-percent connectivity). The connectivity and weights of neurons are randomly assigned, and ignore layer and neuron discrepancies (skip connections). The weights of output neurons are learned such that the network can produce and reproduce specific temporal patterns. The reasoning behind this network comes from the fact that although it is nonlinear, the only weights modified during training are the synapse connections, and hence the error function can be differentiated into a linear system.

The Convolutional Networks

7 | The Convolutional Neural Network (CNN)

Images have a very high dimensionality, and hence training a standard feed-forward network to recognize images would require hundreds of thousands of input neurons, which, besides a blatantly high computational bill, can cause many problems associated with the Curse of Dimensionality in neural networks. The Convolutional Neural Network (CNN) provides a solution to this by utilizing convolutional and pooling layers to help reduce the dimensionality of an image. As convolutional layers are trainable but have significantly less parameters than a standard hidden layer, it is able to highlight important parts of the image and pass each of them forward. Traditionally in CNNs, the last few layers are hidden layers, which process the ‘condensed image information’.

Convolutional Neural Networks perform well on image-based tasks, such as classifying an image as a dog or a cat.

8 | The Deconvolutional Neural Network (DNN)

Deconvolutional Neural Networks, as its name suggests, performs the opposite of a convolutional neural network. Instead of performing convolutions to reduce the dimensionality of an image, a DNN utilizes deconvolutions to create an image, usually from noise. This is an inherently difficult task; consider a CNN’s task to write a three-sentence summary of the complete book of Orwell’s 1984 while a DNN’s task is to write the complete book from a three-sentence structure.

9 | Generative Adversarial Network (GAN)

A Generative Adversarial Network is a specialized type of network designed specifically to generate images, and is composed of two networks — a discriminator and a generator. The discriminator’s task is to discriminate between whether an image is pulled from the dataset or if it has been generated by the generator, and the generator’s task is to generate images convincing enough such that the discriminator cannot distinguish whether it is real or not.

Over time, with careful regulation, these two adversaries compete with each other, each’s drive to succeed improving the other. The end result is a well-trained generator that can spit out a realistic-looking image. The discriminator is a convolutional neural network whose goal is to maximize its accuracy in identifying real/fake images, whereas the generator is a deconvolutional neural network whose goal is to minimize the discriminator’s performance.

Generator diagram.

You can read more about generative adversarial networks here:

The Auto-Encoders

10 | The Auto Encoder (AE)

The fundamental idea of an autoencoder is to take in original, high-dimensionality data, ‘compress it’ into a highly informational and low-dimensional data, and then to project the compressed form into a new space. There are many applications of autoencoders, including dimensionality reduction, image compression, denoising data, feature extraction, image generation, and recommendation systems. It can be purposed as both an unsupervised or a supervised method, can be very insightful as to the nature of the data.

Hidden cells can be replaced with convolutional layers to accommodate processing images.

11 | The Variational Auto Encoder (VAE)

Whereas an autoencoder learns a compressed representation of an input, which could be images or text sequences, for example, by compressing the input and then decompressing it back to match the original input, a variational autoencoder (VAE) learns the parameters of a probability distribution representing the data. Instead of just learning a function representing the data, it gains a more detailed and nuanced view of the data, sampling from the distribution and generating new input data samples. In this sense, it is more of a purely ‘generative’ model, like a GAN.

A VAE uses a probabilistic hidden cell, which applies a radial basis function to the difference between the test case and the cell’s mean.

No-cost online AWS training pathway for researchers and research IT

posted Jun 20, 2020, 1:28 PM by Chris G   [ updated Jun 20, 2020, 1:29 PM ]

To help researchers learn about cloud computing, Amazon Web Services (AWS) curated a list of no-cost, on-demand online courses tailored to researchers’ needs. AWS helps researchers process complex workloads by providing the cost-effective, scalable, and secure compute, storage, and database capabilities needed to accelerate time-to-science. Scientists can quickly analyze massive data pipelines, store petabytes of data, and share their results with collaborators around the world.

The AWS research team selected this list of courses from hundreds of available courses, specifically for researchers and research IT professionals who want to learn foundational cloud services. These online courses are available at any time to help users learn new cloud skills and services.

Research Learning Pathway: Foundational Services

The Research Learning Pathway: Foundational Services is designed for researchers and research IT professionals who want to become more proficient in optimizing research on AWS. Learn how to use the right storage medium, remove heavy lifting with managed services, and reproduce research with containers and software-defined infrastructure. This learning pathway can be completed in just over seven hours, and courses range in length from five minutes to three hours each. We recommend you complete the courses you need in the sequence outlined here.

Research Learning Pathway Foundational Services 2

Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing

posted May 30, 2020, 7:30 AM by Chris G   [ updated May 30, 2020, 7:31 AM ]

Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing

One of the biggest challenges in natural language processing (NLP) is the shortage of training data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets contain only a few thousand or a few hundred thousand human-labeled training examples. However, modern deep learning-based NLP models see benefits from much larger amounts of data, improving when trained on millions, or billions, of annotated training examples. To help close this gap in data, researchers have developed a variety of techniques for training general purpose language representation models using the enormous amount of unannotated text on the web (known as pre-training). The pre-trained model can then be fine-tuned on small-data NLP tasks like question answering and sentiment analysis, resulting in substantial accuracy improvements compared to training on these datasets from scratch.

This week, we open sourced a new technique for NLP pre-training called Bidirectional Encoder Representations from Transformers, or BERT. With this release, anyone in the world can train their own state-of-the-art question answering system (or a variety of other models) in about 30 minutes on a single Cloud TPU, or in a few hours using a single GPU. The release includes source code built on top of TensorFlow and a number of pre-trained language representation models. In our associated paper, we demonstrate state-of-the-art results on 11 NLP tasks, including the very competitive Stanford Question Answering Dataset (SQuAD v1.1).

What Makes BERT Different?
BERT builds upon recent work in pre-training contextual representations — including Semi-supervised Sequence LearningGenerative Pre-TrainingELMo, and ULMFit. However, unlike these previous models, BERT is the first deeply bidirectionalunsupervised language representation, pre-trained using only a plain text corpus (in this case, Wikipedia).

Forecasting big time series: theory and practice

posted May 5, 2020, 1:45 PM by Chris G   [ updated May 5, 2020, 1:46 PM ]

Forecasting big time series: theory and practice

View recording of tutorial presented at The Web Conference 2020.

During The Web Conference in April, Amazon scientists and scholars joined external researchers, policy makers, developers and others for an all-virtual conference to discuss the evolution of the Web, the standardization of its associated technologies, and the impact of those technologies on society and culture.

One component of the event: a tutorial of time series forecasting, a key ingredient in the automation and optimization of business processes

From raw data to machine learning model, no coding required

posted Apr 14, 2020, 1:37 PM by Chris G   [ updated Apr 14, 2020, 1:40 PM ]

From raw data to machine learning model, no coding required

Machine learning was once the domain of specialized researchers, with complex models and proprietary code required to build a solution. But, Cloud AutoML has made machine learning more accessible than ever before. By automating the model building process, users can create highly performant models with minimal machine learning expertise (and time).

However, many AutoML tutorials and how-to guides assume that a well-curated dataset is already in place. In reality, though, the steps required to pre-process the data and perform feature engineering can be just as complicated as building the model. The goal of this post is to show you how to connect all the dots, starting with real-world raw data and ending with a trained model.


posted Apr 13, 2020, 9:17 AM by Chris G

Inside Kaggle you’ll find all the code & data you need to do your data science work. Use over 19,000 public datasets and 200,000 public notebooks to conquer any analysis in no time.

Start with more than a blinking cursor

Kaggle offers a no-setup, customizable, Jupyter Notebooks environment. Access free GPUs and a huge repository of community published data & code.

Take a micro-course and start applying your new skills immediately

Join a competition to solve real-world machine learning problems

Awesome Computer Vision Resources

posted Sep 1, 2019, 8:35 AM by Chris G   [ updated Sep 1, 2019, 8:36 AM ]

Awesome Computer Vision: Awesome

A curated list of awesome computer vision resources:

First-person Hyperlapse Videos (Technical)

posted Jan 17, 2019, 7:20 AM by Chris G   [ updated Jan 17, 2019, 7:20 AM ]

1-10 of 20