How to Make a Chatbot - Intro to Deep Learning #12

Hello, world, it's Saraj, and let's build

a chat bot that can answer questions

about any text you give it.

Be it an article, or even a book, using Keras.

Just imagine the boost in productivity all of us

will have once we have access to expert systems for any given


Instead of sifting through all the jargon

in a scientific paper, you just give it the paper

and ask it the relevant questions.

Entire textbooks, libraries, videos,

images whatever you just feed it some data

and it would become an expert at it.

(saraj (voiceover)) All 7 billion people on earth

would have the capability of learning anything much faster.

The web democratize information and this next evolution

will democratize something just as important, guidance.

The ideal chat bot can talk intelligently about any domain.

That's the holy grail, but domain specific chat

are definitely possible.

The technical term for this is a question answering system.

Surprisingly, we've been able to do this since way back

in the '70s.

(saraj (voiceover)) Lunar was one of the first.

It was, as you might have guessed, rule based,

so it allowed geologists to ask questions about moon rocks

from the Apollo missions.

A later improvement to rule based Q&A systems, allowed

programmers to encode patterns into their bot

called artificial intelligence markup language, or AIML.

That meant less code for the same results.

But yeah, don't use AIML.

It's so old it makes Numa Numa look new.

Now with deep learning, we can do this

without hard coded responses and have much better results.

(saraj (voiceover)) The generic case

is that you give it some DAX as input,

and then asking a question.

It'll give you the right answer after logically reasoning

about it.

The input could also be that everybody is happy.

And then the question could be what's the sentiment?

The answer would be positive.

Other possible questions are what's the entity?

What are the part of speech tags?

What's the translation to French?

We need a common model for all of these questions.

This is what the AI community is

trying to figure out how to do.

Facebook research made some great progress

with this just two years ago, when

they released a paper introducing this really cool

idea called a memory network.

LSTM networks proved to be a useful tool in tasks like tech

summarization, but their memory, encoded by hidden states

and weights, is too small for very, very long sequences

of data.

Be that a book or a movie.

A way around this for language translation,

for example, what's to store multiple LSTM states,

and use an attention mechanism to choose between them.

But they developed another strategy

that outperformed LSTM's for Q&A systems.

(saraj (voiceover)) The idea was to allow a neural network

to use an external data structure as memory storage.

It learns where to retrieve the required memory from the memory

bank in a supervised way.

When it came to answering questions

from POI data that was generated,

that info was pretty easy to come by.

But in real world data, it is not that easy.

Most recently, there was a four month long Kaggle contest

that a startup called MetaMind placed in the top 5% for.

To do this they built a new state of the art model

called a dynamic memory network that built

on Facebook's initial idea.

That's the one we'll focus on, so let's build it

programmatically using Keras.

(saraj (voiceover)) This data set is pretty well organized.

It was created by Facebook AI research

for the specific goal of improving textual reasoning.

It's grouped into 20 different tasks.

Each task test a different aspect of reasoning.

So, overall it provides a good overview

of all the different capabilities of your learning


There are 1,000 questions for training,

and 1,000 for testing per task.

Each question is paired with a statement,

or series of statements, as well as an answer.

The goal is to have one model that can succeed in all tasks


We'll use pre-trained GloVe vectors

to help create a sequence of word vectors

from our input sentences.

And these vectors will act as inputs to the model.

The daemon architecture defines two types of memory.

Semantic, and episodic.

These input factors are considered the semantic memory,

whereas, episodic memory might contain other knowledge as


And we'll talk about that in a second.

We can fetch our Babel data set from the web,

and split them into training and testing data.

GloVe will help convert our words to vectors,

so they're ready to be fed into our model.

The first module, the input module,

is a GRU, or gated recurrent unit,

that runs on a sequence of word vectors.

A GRU cell is kind of like an LSTM cell,

but it's more computationally efficient since it only

has two gates, and it doesn't use a memory unit.

The two gates control when its content is updated,

and when it's erased.







And the hidden state of the input module

represents the input process, so far in a vector.

It outputs hidden states after every sentence,

and these outputs are called facts in the paper,

because they represent the essence of what is fed.

Given a word vector and the previous timestep vector,

we'll compute the current timestep vector.

The update gate is a single layer neural network.

We sum up the matrix multiplications,

and add a biased term.

And then the sigmoid squashes it to a list

of values between 0 and 1, the output vector.

We do this twice with different sets of weights,

then we use a reset gate that will

learn to ignore the past timesteps when necessary.

For example, if the next sentence

has nothing to do with those that came before it.

The update gate is similar in that it

can learn to ignore the current timestep entirely.

Maybe the current sentence has nothing to do with the answer.

Whereas, previous ones did.

(saraj (voiceover)) Then, there's the question module.

It processes the question word by word,

and outputs a vector using the same GRU as the input module,

and the same weights.

We can encode both of them by creating

embedding layers for both.

Then we'll create an episodic memory representation for both.

The motivation for this in the paper,

came from the hippocampus function in our brain.

It's able to retrieve temporal states that

are triggered by some response, like a sight or a sound.

(saraj (voiceover)) Both the fact and question

vectors that are extracted from the input

enter the episodic memory module.

It's composed of two nested GRU's.

The inner GRU generates what are called episodes.

It does this by passing over the facts from the input module.

When updating its inner state, it takes into account

the output of an attention function on the current fact.

The attention function gives a score between 0 and 1

to each fact.

And so, the GRU ignores facts with low scores.

After each full pass on all the facts,

the inner GRU outputs an episode which

is then fed to the outer GRU.

The reason we need multiple episodes,

is so our model can learn what part of a sentence

it should pay attention to after realizing after one pass,

that something else is important.

With multiple passes, we can gather increasingly relevant


We can initialize our model and set its loss function

as categorical cross entropy with

the stochastic gradient descent implementation, RMSProp.

Then train it on the given data using the fit function.

We can test this code in the browser

without waiting for it to train, because luckily for us,

this researcher uploaded a web app with a fully trained model

of this code.

We can generate a story, which is

a collection of sentences, each describing

an event in sequential order.

Then we'll ask a question.

Pretty high accuracy response.

Let's generate another story, and ask it another question.

Hero status.

Let's go over the three key facts we've learned.

GRU's control the flow of data like LSTM cells,

but are more computationally efficient.

Using just two gates update and reset.

Dynamic memory networks offer state of the art performance

in question answering systems.

And they do this, by using both semantic and episodic memory,

inspired by the hippocampus.

Drum roll please.

No, never mind.

Nemanja Tomic is the coding challenge winner

from last week.

(saraj (voiceover)) He implemented

his own neural machine translator

by training it on movie subtitles in both English

and German.

You can see all the results in his IPython notebook.

Amazing work.

Wizard of the week.

And the runner up is Vishal Batchu.

Despite the massive amount of training time

NMT requires, Vishal was able to achieve some great results.

I bow to both of you.

This week's challenge is to make your own Q&A chat bot.

All the details are in the readme.

Get hub links, go in the comments

and I'll announce the winner a week from today.

Please subscribe for more programming videos.

Check out this related video.

And for now, I've got to ask the right questions.

So, thanks for watching.