Hello, world, it's Saraj, and let's build
a chat bot that can answer questions
about any text you give it.
Be it an article, or even a book, using Keras.
Just imagine the boost in productivity all of us
will have once we have access to expert systems for any given
Instead of sifting through all the jargon
in a scientific paper, you just give it the paper
and ask it the relevant questions.
Entire textbooks, libraries, videos,
images whatever you just feed it some data
and it would become an expert at it.
(saraj (voiceover)) All 7 billion people on earth
would have the capability of learning anything much faster.
The web democratize information and this next evolution
will democratize something just as important, guidance.
The ideal chat bot can talk intelligently about any domain.
That's the holy grail, but domain specific chat
are definitely possible.
The technical term for this is a question answering system.
Surprisingly, we've been able to do this since way back
in the '70s.
(saraj (voiceover)) Lunar was one of the first.
It was, as you might have guessed, rule based,
so it allowed geologists to ask questions about moon rocks
from the Apollo missions.
A later improvement to rule based Q&A systems, allowed
programmers to encode patterns into their bot
called artificial intelligence markup language, or AIML.
That meant less code for the same results.
But yeah, don't use AIML.
It's so old it makes Numa Numa look new.
Now with deep learning, we can do this
without hard coded responses and have much better results.
(saraj (voiceover)) The generic case
is that you give it some DAX as input,
and then asking a question.
It'll give you the right answer after logically reasoning
The input could also be that everybody is happy.
And then the question could be what's the sentiment?
The answer would be positive.
Other possible questions are what's the entity?
What are the part of speech tags?
What's the translation to French?
We need a common model for all of these questions.
This is what the AI community is
trying to figure out how to do.
Facebook research made some great progress
with this just two years ago, when
they released a paper introducing this really cool
idea called a memory network.
LSTM networks proved to be a useful tool in tasks like tech
summarization, but their memory, encoded by hidden states
and weights, is too small for very, very long sequences
Be that a book or a movie.
A way around this for language translation,
for example, what's to store multiple LSTM states,
and use an attention mechanism to choose between them.
But they developed another strategy
that outperformed LSTM's for Q&A systems.
(saraj (voiceover)) The idea was to allow a neural network
to use an external data structure as memory storage.
It learns where to retrieve the required memory from the memory
bank in a supervised way.
When it came to answering questions
from POI data that was generated,
that info was pretty easy to come by.
But in real world data, it is not that easy.
Most recently, there was a four month long Kaggle contest
that a startup called MetaMind placed in the top 5% for.
To do this they built a new state of the art model
called a dynamic memory network that built
on Facebook's initial idea.
That's the one we'll focus on, so let's build it
programmatically using Keras.
(saraj (voiceover)) This data set is pretty well organized.
It was created by Facebook AI research
for the specific goal of improving textual reasoning.
It's grouped into 20 different tasks.
Each task test a different aspect of reasoning.
So, overall it provides a good overview
of all the different capabilities of your learning
There are 1,000 questions for training,
and 1,000 for testing per task.
Each question is paired with a statement,
or series of statements, as well as an answer.
The goal is to have one model that can succeed in all tasks
We'll use pre-trained GloVe vectors
to help create a sequence of word vectors
from our input sentences.
And these vectors will act as inputs to the model.
The daemon architecture defines two types of memory.
Semantic, and episodic.
These input factors are considered the semantic memory,
whereas, episodic memory might contain other knowledge as
And we'll talk about that in a second.
We can fetch our Babel data set from the web,
and split them into training and testing data.
GloVe will help convert our words to vectors,
so they're ready to be fed into our model.
The first module, the input module,
is a GRU, or gated recurrent unit,
that runs on a sequence of word vectors.
A GRU cell is kind of like an LSTM cell,
but it's more computationally efficient since it only
has two gates, and it doesn't use a memory unit.
The two gates control when its content is updated,
and when it's erased.
And the hidden state of the input module
represents the input process, so far in a vector.
It outputs hidden states after every sentence,
and these outputs are called facts in the paper,
because they represent the essence of what is fed.
Given a word vector and the previous timestep vector,
we'll compute the current timestep vector.
The update gate is a single layer neural network.
We sum up the matrix multiplications,
and add a biased term.
And then the sigmoid squashes it to a list
of values between 0 and 1, the output vector.
We do this twice with different sets of weights,
then we use a reset gate that will
learn to ignore the past timesteps when necessary.
For example, if the next sentence
has nothing to do with those that came before it.
The update gate is similar in that it
can learn to ignore the current timestep entirely.
Maybe the current sentence has nothing to do with the answer.
Whereas, previous ones did.
(saraj (voiceover)) Then, there's the question module.
It processes the question word by word,
and outputs a vector using the same GRU as the input module,
and the same weights.
We can encode both of them by creating
embedding layers for both.
Then we'll create an episodic memory representation for both.
The motivation for this in the paper,
came from the hippocampus function in our brain.
It's able to retrieve temporal states that
are triggered by some response, like a sight or a sound.
(saraj (voiceover)) Both the fact and question
vectors that are extracted from the input
enter the episodic memory module.
It's composed of two nested GRU's.
The inner GRU generates what are called episodes.
It does this by passing over the facts from the input module.
When updating its inner state, it takes into account
the output of an attention function on the current fact.
The attention function gives a score between 0 and 1
to each fact.
And so, the GRU ignores facts with low scores.
After each full pass on all the facts,
the inner GRU outputs an episode which
is then fed to the outer GRU.
The reason we need multiple episodes,
is so our model can learn what part of a sentence
it should pay attention to after realizing after one pass,
that something else is important.
With multiple passes, we can gather increasingly relevant
We can initialize our model and set its loss function
as categorical cross entropy with
the stochastic gradient descent implementation, RMSProp.
Then train it on the given data using the fit function.
We can test this code in the browser
without waiting for it to train, because luckily for us,
this researcher uploaded a web app with a fully trained model
of this code.
We can generate a story, which is
a collection of sentences, each describing
an event in sequential order.
Then we'll ask a question.
Pretty high accuracy response.
Let's generate another story, and ask it another question.
Let's go over the three key facts we've learned.
GRU's control the flow of data like LSTM cells,
but are more computationally efficient.
Using just two gates update and reset.
Dynamic memory networks offer state of the art performance
in question answering systems.
And they do this, by using both semantic and episodic memory,
inspired by the hippocampus.
Drum roll please.
No, never mind.
Nemanja Tomic is the coding challenge winner
from last week.
(saraj (voiceover)) He implemented
his own neural machine translator
by training it on movie subtitles in both English
You can see all the results in his IPython notebook.
Wizard of the week.
And the runner up is Vishal Batchu.
Despite the massive amount of training time
NMT requires, Vishal was able to achieve some great results.
I bow to both of you.
This week's challenge is to make your own Q&A chat bot.
All the details are in the readme.
Get hub links, go in the comments
and I'll announce the winner a week from today.
Please subscribe for more programming videos.
Check out this related video.
And for now, I've got to ask the right questions.
So, thanks for watching.