Posted on: October 25, 2020 by Chau Pham

Last updated: February 15, 2022 - Add References [21] and [22], along with some pictures from them on the Attention part.

Overview

What is BERT?

Created by Google AI Language in late 2018 (https://arxiv.org/pdf/1810.04805.pdf)
Stands for Bidirectional Encoder Representations from Transformers
Neural network-based technique for natural language processing (NLP)
Pre-training ⇒ Use BERT for our own tasks
State-of-the-art results on many natural language processing tasks

Some tasks we can do with BERT

"We see billions of searches every day, and 15 percent of those queries are ones we haven’t seen before. BERT will help Search better understand one in 10 searches in the U.S" - Google [2]

Why only 1/10? BERT can not help with brand names, or short keywords. BERT needs quite long (or not too short) sentences for understanding the context.

BERT helps to understand the language better. It focuses on "for someone" rather then catching keywords (src [2])

BERT helps to understand the language better. It focuses on "for someone" rather then catching keywords (src [2])

“One of the biggest challenges in natural language processing is the shortage of training data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets contain only a few thousand or a few hundred thousand human-labelled training examples." – Google AI [3]

⇒ BERT comes to the rescue

How to apply BERT?

Open source on Github [11] ⇒ Download the pre-trained models of BERT