Artificial Intelligence

Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing

posted May 30, 2020, 7:30 AM by Chris G   [ updated May 30, 2020, 7:31 AM ]

Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing

One of the biggest challenges in natural language processing (NLP) is the shortage of training data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets contain only a few thousand or a few hundred thousand human-labeled training examples. However, modern deep learning-based NLP models see benefits from much larger amounts of data, improving when trained on millions, or billions, of annotated training examples. To help close this gap in data, researchers have developed a variety of techniques for training general purpose language representation models using the enormous amount of unannotated text on the web (known as pre-training). The pre-trained model can then be fine-tuned on small-data NLP tasks like question answering and sentiment analysis, resulting in substantial accuracy improvements compared to training on these datasets from scratch.

This week, we open sourced a new technique for NLP pre-training called Bidirectional Encoder Representations from Transformers, or BERT. With this release, anyone in the world can train their own state-of-the-art question answering system (or a variety of other models) in about 30 minutes on a single Cloud TPU, or in a few hours using a single GPU. The release includes source code built on top of TensorFlow and a number of pre-trained language representation models. In our associated paper, we demonstrate state-of-the-art results on 11 NLP tasks, including the very competitive Stanford Question Answering Dataset (SQuAD v1.1).

What Makes BERT Different?
BERT builds upon recent work in pre-training contextual representations — including Semi-supervised Sequence LearningGenerative Pre-TrainingELMo, and ULMFit. However, unlike these previous models, BERT is the first deeply bidirectionalunsupervised language representation, pre-trained using only a plain text corpus (in this case, Wikipedia).

Forecasting big time series: theory and practice

posted May 5, 2020, 1:45 PM by Chris G   [ updated May 5, 2020, 1:46 PM ]

Forecasting big time series: theory and practice

View recording of tutorial presented at The Web Conference 2020.

During The Web Conference in April, Amazon scientists and scholars joined external researchers, policy makers, developers and others for an all-virtual conference to discuss the evolution of the Web, the standardization of its associated technologies, and the impact of those technologies on society and culture.

One component of the event: a tutorial of time series forecasting, a key ingredient in the automation and optimization of business processes

From raw data to machine learning model, no coding required

posted Apr 14, 2020, 1:37 PM by Chris G   [ updated Apr 14, 2020, 1:40 PM ]

From raw data to machine learning model, no coding required

Machine learning was once the domain of specialized researchers, with complex models and proprietary code required to build a solution. But, Cloud AutoML has made machine learning more accessible than ever before. By automating the model building process, users can create highly performant models with minimal machine learning expertise (and time).

However, many AutoML tutorials and how-to guides assume that a well-curated dataset is already in place. In reality, though, the steps required to pre-process the data and perform feature engineering can be just as complicated as building the model. The goal of this post is to show you how to connect all the dots, starting with real-world raw data and ending with a trained model.


posted Apr 13, 2020, 9:17 AM by Chris G

Inside Kaggle you’ll find all the code & data you need to do your data science work. Use over 19,000 public datasets and 200,000 public notebooks to conquer any analysis in no time.

Start with more than a blinking cursor

Kaggle offers a no-setup, customizable, Jupyter Notebooks environment. Access free GPUs and a huge repository of community published data & code.

Take a micro-course and start applying your new skills immediately

Join a competition to solve real-world machine learning problems

Awesome Computer Vision Resources

posted Sep 1, 2019, 8:35 AM by Chris G   [ updated Sep 1, 2019, 8:36 AM ]

Awesome Computer Vision: Awesome

A curated list of awesome computer vision resources:

First-person Hyperlapse Videos (Technical)

posted Jan 17, 2019, 7:20 AM by Chris G   [ updated Jan 17, 2019, 7:20 AM ]

Star Wars: Building a Galaxy With Code

posted Jan 17, 2019, 7:18 AM by Chris G   [ updated Jan 17, 2019, 7:19 AM ]

Learn to program droids, and create your own Star Wars game in a galaxy far, far away.

Quantum Physics and Universal Beauty - with Frank Wilczek - YouTube

posted Jan 17, 2019, 7:15 AM by Chris G   [ updated Jan 17, 2019, 7:16 AM ]

Remove Image Background

posted Dec 19, 2018, 8:23 AM by Chris G   [ updated Dec 19, 2018, 8:23 AM ]

Remove Image Background 

100% automatically – in 5 seconds – without a single click

Image upscaling supercharged

posted Dec 1, 2017, 7:13 PM by Chris G   [ updated Dec 1, 2017, 7:13 PM ]

Let's Enhance. A State of the art neural networks to help you upscale and enhance your images.

1-10 of 16