Natural Language Processing with Deep Learning | Stanford University

Chris Manning and Richard Socher are giving lectures on “Natural Language Processing with Deep Learning CS224N/Ling284” at Stanford University.

Natural language processing (NLP) deals with the key artificial intelligence technology of understanding complex human language communication. Natural language processing (NLP) is one of the most important technologies of the information age.

Understanding complex language utterances is also a vital part of artificial intelligence. Applications of NLP are everywhere because people communicate almost everything in language: web search, advertisement, emails, customer service, language translation, radiology reports, etc. There is a large variety of underlying tasks and machine learning models powering NLP applications. Recently, deep learning approaches have obtained very high performance across many different NLP tasks. These models can often be trained with a single end-to-end model and do not require traditional, task-specific feature engineering.

This lecture series provides a thorough introduction to the cutting-edge research in deep learning applied to NLP, an approach that has recently obtained very high performance across many different NLP tasks including question answering and machine translation. It emphasizes how to implement, train, debug, visualize, and design neural network models, covering the main technologies of word vectors, feed-forward models, recurrent neural networks, recursive neural networks, convolutional neural networks, and recent models involving a memory component.

Lecture 1 | Natural Language Processing with Deep Learning
Lecture 1 introduces the concept of Natural Language Processing (NLP) and the problems NLP faces today. The concept of representing words as numeric vectors is then introduced, and popular approaches to designing word vectors are discussed.

Lecture 2 | Word Vector Representations: word2vec
Lecture 2 continues the discussion on the concept of representing words as numeric vectors and popular approaches to designing word vectors.

Lecture 3 | GloVe: Global Vectors for Word Representation
Lecture 3 introduces the GloVe model for training word vectors. Then it extends our discussion of word vectors (interchangeably called word embeddings) by seeing how they can be evaluated intrinsically and extrinsically. As we proceed, we discuss the example of word analogies as an intrinsic evaluation technique and how it can be used to tune word embedding techniques. We then discuss training model weights/parameters and word vectors for extrinsic tasks. Lastly, we motivate artificial neural networks as a class of models for natural language processing tasks.

Lecture 4: Word Window Classification and Neural Networks
Lecture 4 introduces single and multilayer neural networks, and how they can be used for classification purposes.

Lecture 5: Backpropagation and Project Advice
Lecture 5 discusses how neural networks can be trained using a distributed gradient descent technique known as backpropagation.

Lecture 6: Dependency Parsing
Lecture 6 covers dependency parsing which is the task of analyzing the syntactic dependency structure of a given input sentence S. The output of a dependency parser is a dependency tree where the words of the input sentence are connected by typed dependency relations.

Lecture 7: Introduction to TensorFlow
Lecture 7 covers Tensorflow. TensorFlow is an open-source software library for numerical computation using data flow graphs. It was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research.

Lecture 8: Recurrent Neural Networks and Language Models
Lecture 8 covers traditional language models, RNNs, and RNN language models. Also reviewed are important training problems and tricks, RNNs for other sequence tasks, and bidirectional and deep RNNs.

Lecture 9: Machine Translation and Advanced Recurrent LSTMs and GRUs
Lecture 9 recaps the most important concepts and equations covered so far followed by machine translation and fancy RNN models tackling MT.

Review Session: Midterm Review
This midterm review session covers work vectors representations, neural networks, and RNNs. Also reviewed is backpropagation, gradient calculation, and dependency parsing.

Lecture 10: Neural Machine Translation and Models with Attention
Lecture 10 introduces translation, machine translation, and neural machine translation. Google’s new NMT is highlighted followed by sequence models with attention as well as sequence model decoders.

Lecture 11: Gated Recurrent Units and Further Topics in NMT
Lecture 11 provides a final look at gated recurrent units like GRUs/LSTMs followed by machine translation evaluation, dealing with large vocabulary output, and sub-word and character-based models. It also includes research highlight “Lip reading sentences in the wild.”

Lecture 12: End-to-End Models for Speech Processing
Lecture 12 looks at traditional speech recognition systems and motivation for end-to-end models. Also covered are Connectionist Temporal Classification (CTC) and Listen Attend and Spell (LAS), a sequence-to-sequence based model for speech recognition.

Lecture 13: Convolutional Neural Networks
Lecture 13 provides a mini-tutorial on Azure and GPUs followed by research highlight “Character-Aware Neural Language Models.” Also covered are CNN Variant 1 and 2 as well as a comparison between sentence models: BoV, RNNs, CNNs.

Lecture 14: Tree Recursive Neural Networks and Constituency Parsing
Lecture 14 looks at compositionality and recursion followed by structure prediction with simple Tree RNN: Parsing. Research highlight “Deep Reinforcement Learning for Dialogue Generation” is covered is backpropagation through Structure.

Lecture 15: Coreference Resolution
Lecture 15 covers what is coreference via a working example. It also includes research highlight “Summarizing Source Code”, an introduction to coreference resolution and neural coreference resolution.

Lecture 16: Dynamic Neural Networks for Question Answering
Lecture 16 addresses the question “Can all NLP tasks be seen as question answering problems?”.

Lecture 17: Issues in NLP and Possible Architectures for NLP
Lecture 17 looks at solving language, efficient tree-recursive models SPINN and SNLI, as well as research highlight “Learning to compose for QA.” Also covered are interlude pointer/copying models and sub-word and character-based models.

Lecture 18: Tackling the Limits of Deep Learning for NLP
Lecture 18 looks at tackling the limits of deep learning for NLP followed by a few presentations.