Dynamic Pooling And Unfolding Recursive Autoencoders For Paraphrase Detection
Paraphrase detection is the task of examining two sentences and determining
whether they have the same meaning. In order to obtain high accuracy on this
task, thorough syntactic and semantic analysis of the two statements is needed.
We introduce a method for paraphrase detection based on recursive autoencoders
(RAE). Our unsupervised RAEs are based on a novel unfolding objective and learn
feature vectors for phrases in syntactic trees. These features are used to measure
the word- and phrase-wise similarity between two sentences. Since sentences may
be of arbitrary length, the resulting matrix of similarity measures is of variable
size. We introduce a novel dynamic pooling layer which computes a fixed-sized
representation from the variable-sized matrices. The pooled representation is then
used as input to a classifier. Our method outperforms other state-of-the-art approaches
on the challenging MSRP paraphrase corpus.
![]() |
|
An overview of our paraphrase model. The recursive autoencoder learns phrase features for each node in a parse tree. The distances between all nodes then fill a similarity matrix whose size depends on the length of the sentences. Using a novel dynamic pooling layer we can compare the variable-sized sentences and classify pairs as being paraphrases or not. |
Download Paper
Download Code
Full Paraphrase System
- This code classifies two sentences as either being paraphrases or not
- Download: classifyParaphrases.zip [160 MB]
- The code includes the word look-up table from Joseph Turian, pre-processed and in Matlab format.
- The code also includes the Stanford Parser.
Computing Compositional Vectors
- This smaller code package computes phrase vector representations based on a trained, unfolding recursive neural network as described in the above paper. It is designed to be easy to use, all you need to do is to put phrases for which you want to compute a compositional vector into a text file, one phrase or sentence per line. The output will be another textfile with the vectors.
- Download: codeRAEVectorsNIPS2011.zip [42 MB]
Updated Related Work
- This page has a comprehensive list of papers on this task and dataset: http://aclweb.org/aclwiki/index.php?title=Paraphrase_Identification_%28State_of_the_art%29
Bibtex
- Please cite the following paper when you use the data set or code:@incollection{SocherEtAl2011:PoolRAE,
title = {{Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection}},
author = {{Richard Socher and Eric H. Huang and Jeffrey Pennington and Andrew Y. Ng and Christopher D. Manning}},
booktitle = {{Advances in Neural Information Processing Systems 24}},
year = {2011}
}
Comments
For remarks, criticism or other thoughts on the paper. Save what you write before you post, then type in the password, post (nothing happens), then copy the text and re-post. It's the only way to prevent spammers.
