ACL 2012 + NAACL 2013 Tutorial: Deep Learning for NLP (without Magic)

Richard Socher, Chris Manning and Yoshua Bengio

In the spring quarter of 2015, I gave an entire class at Stanford on deep learning for natural language processing. If you're interested in all the details of these methods and applications, see


Updated Version of Tutorial at NAACL 2013



Machine learning is everywhere in today's NLP, but by and large machine learning amounts to numerical optimization of weights for human designed representations and features. The goal of deep learning is to explore how computers can take advantage of data to develop features and representations appropriate for complex interpretation tasks. This tutorial aims to cover the basic motivation, ideas, models and learning algorithms in deep learning for natural language processing. Recently, these methods have been shown to perform very well on various NLP tasks such as language modeling, POS tagging, named entity recognition, sentiment analysis and paraphrase detection, among others. The most attractive quality of these techniques is that they can perform well without any external hand-designed resources or time-intensive feature engineering. Despite these advantages, many researchers in NLP are not familiar with these methods. Our focus is on insight and understanding, using graphical illustrations and simple, intuitive derivations. The goal of the tutorial is to make the inner workings of these techniques transparent, intuitive and their results interpretable, rather than black boxes labeled "magic here". The first part of the tutorial presents the basics of neural networks, neural word vectors, several simple models based on local windows and the math and algorithms of training via backpropagation. In this section applications include language modeling and POS tagging. In the second section we present recursive neural networks which can learn structured tree outputs as well as vector representations for phrases and sentences. We cover both equations as well as applications. We show how training can be achieved by a modified version of the backpropagation algorithm introduced before. These modifications allow the algorithm to work on tree structures. Applications include sentiment analysis and paraphrase detection. We also draw connections to recent work in semantic compositionality in vector spaces. The principle goal, again, is to make these methods appear intuitive and interpretable rather than mathematically confusing. By this point in the tutorial, the audience members should have a clear understanding of how to build a deep learning system for word-, sentence- and document-level tasks. The last part of the tutorial gives a general overview of the different applications of deep learning in NLP, including bag of words models. We will provide a discussion of NLP-oriented issues in modeling, interpretation, representational power, and optimization.


  1. The Basics
    1. Motivations
    2. From logistic regression to neural networks
    3. Word representations
    4. Unsupervised word vector learning
    5. Backpropagation Training
    6. Learning word-level classifiers: POS and NER
    7. Sharing statistical strength
  2. Recursive Neural Networks
    1. Motivation
    2. Recursive Neural Networks for Parsing
    3. Theory: Backpropagation Through Structure
    4. Recursive Autoencoders
    5. Application to Sentiment Analysis and Paraphrase Detection
    6. Compositionality Through Recursive Matrix-Vector Spaces
    7. Relation classification
  3. Applications, Discussion, and Resources
    1. Neural language models
    2. Assorted other speech and NLP applications
    3. Resources (readings, code)
    4. Discussion
    5. Tricks of the trade
    6. Limitations, advantages, future directions


Further Information

For your comments, related questions or errata:

Save your text first, then fill out captcha, then save again.

Add Comment 
Sign as Author 
Enter code:

Wenbo?04 December 2016, 20:34

Hi Richard,

I am a big fan of your C S224d? class and am excited about your 2017 cs224n. I was wondering what is to be expected from cs224n and whether or not it would be recorded. Thanks!

Gebre?21 August 2016, 18:24

Hi Richard, I am building NER System for Tigrigna, one of under resourced Semitic language like Arabic. Can you guide me please? I faced problems related to UNICODE and etc. Can you where can I get a java code/python script for NER? Thank you!

Rachana Baldania?22 May 2016, 09:48

Hi Richard, I am a big fan of yours right now i am working on deep learning in sentiment analysis i want some guidance will u plz provide me??

Rachana Baldania?11 February 2016, 06:32

Hi Richard, I am a big fan of yours right now i am working on deep learning in sentiment analysis i want some guidance will u plz provide me??

Bin?06 January 2016, 13:48

Please email me a copy of your program of your paper 'Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks'? I wll appreciate your kind help.

fred?18 June 2015, 18:44

Hi Richard,how can I get the code of your paper 'Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks'? thank you!

Nacho?27 October 2014, 08:24

Hi Richard, I see that you use MATLAB and Java. It is better than use, for instance, Theano (That I see you also use)? I'm ML scientist (NLP), various on ML concepts are clear to me (specially on regularized machines and MLP) although there is a huge to learn. However implementations and large experiments are missing in mi phd thesis (on learning semantic features) so I'd like to get started. I know MATLAB, C and Python (little less), however the latter seems to be better for several NLP tasks. Any way for possible cases you consider according to your expertise, what do you recommend me?