Dynamic Pooling And Unfolding Recursive Autoencoders For Paraphrase Detection
Paraphrase detection is the task of examining two sentences and determining
whether they have the same meaning. In order to obtain high accuracy on this
task, thorough syntactic and semantic analysis of the two statements is needed.
We introduce a method for paraphrase detection based on recursive autoencoders
(RAE). Our unsupervised RAEs are based on a novel unfolding objective and learn
feature vectors for phrases in syntactic trees. These features are used to measure
the word- and phrase-wise similarity between two sentences. Since sentences may
be of arbitrary length, the resulting matrix of similarity measures is of variable
size. We introduce a novel dynamic pooling layer which computes a fixed-sized
representation from the variable-sized matrices. The pooled representation is then
used as input to a classifier. Our method outperforms other state-of-the-art approaches
on the challenging MSRP paraphrase corpus.
|
An overview of our paraphrase model. The recursive autoencoder learns phrase features
for each node in a parse tree. The distances between all nodes then fill a similarity matrix whose
size depends on the length of the sentences. Using a novel dynamic pooling layer we can compare
the variable-sized sentences and classify pairs as being paraphrases or not.
|
Download Paper
Download Code
Full Paraphrase System
Computing Compositional Vectors
- This smaller code package computes phrase vector representations based on a trained, unfolding recursive neural network as described in the above paper. It is designed to be easy to use, all you need to do is to put phrases for which you want to compute a compositional vector into a text file, one phrase or sentence per line. The output will be another textfile with the vectors.
- Download: codeRAEVectorsNIPS2011.zip [42 MB]
Updated Related Work
Bibtex
- Please cite the following paper when you use the data set or code:
@incollection{SocherEtAl2011:PoolRAE,
title = {{Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection}},
author = {{Richard Socher and Eric H. Huang and Jeffrey Pennington and Andrew Y. Ng and Christopher D. Manning}},
booktitle = {{Advances in Neural Information Processing Systems 24}},
year = {2011}
}
Comments
For remarks, criticism or other thoughts on the paper.
Save what you write before you post, then type in the password, post (nothing happens), then copy the text and re-post. It's the only way to prevent spammers.
A few questions and thoughts as I try to implement and extend your results...
1. Can you explain exactly what is meant by ensuring each entry in the pooled similarity matrix has mean 0 and variance 1. Is this per matrix, per element per mini-batch?
2. Also, did you experiment with other approaches to classification besides simply flattening the similarity matrix and performing binary logistic regression?
3. Is there an updated version of this general approach you think seems promising? I.e. with attention or more complex compositional functions (Tree-LST Ms?, different weight matrices for different POS combinations, etc.)
Thanks so much!
Hi Richard,
Is there a python implementation available?
AG? — 23 September 2015, 11:45
Hello Richard, I am running the Full Paraphrase System code, but it is throwing error while loading savedParams/params.mat file. The error is "no such file, '/juice/scr21/scr/ehhuang/projects/deepRAE/code/norm1tanh.m" I think this variable should be "code/norm1tanh.m". Can You please tell me how to resolve this error? Thanks...!!
I would also like to know how to retrain the RAE model.
Hi Richard, I would like to know can I retrain the RAE model?
Bests, Connie
Hi Connie,
Yes, you just need to adjust the training data and train it on two languages.
Best,
Richard
Hi Richard,
I would like to know whether this paraphrase system can apply on bilingual language or not?
Bests,
Connie
@D Mr?,
There is a flag you can set to binarize trees, you can find it in the code of our sentiment model from EMNLP 2013 and the newest Core NLP? package.
@sriram bhargav
Yes, if you retrain the model you can use any set of word vectors you want, e.g. from our ACL 2012 paper, or a new set from Pennington et al 2014 (see front page for link in a few days) or word2vec.
You can probably get an improvement in performance by using a larger vocabulary.
Best,
Richard
DMr? — 04 July 2014, 00:21
To clarify, in the following tree, there are two NP nodes with 3 child leaves:
(ROOT
(S
(NP (NNP Shisheh))
(VP (VBZ is)
(NP
(NP (DT a) (NN village))
(PP (IN in)
(NP (NNP Hemmatabad) (NNP Rural) (NNP District))))
(, ,)
(PP (IN in)
(NP
(NP (DT the) (NNP Central) (NNP District))
(PP (IN of)
(NP
(NP (NNP Borujerd) (NNP County))
(, ,)
(NP
(NP (NNP Lorestan) (NNP Province))
(, ,)
(NP (NNP Iran))))))))
(. .)))
Would you just qualify those children as named entities and average the leaves on the same level?
How would you have determined to concatenate the child vectors?
DMr? — 03 July 2014, 23:50
Hi Richard,
Thanks for all the research and code, it's a great help to all the lowly dataminers out there. I'm wondering how you opted to construct the binary parse trees for a given sentence during unsupervised training. Unless I'm missing something, the Stanford Parser won't restrict itself to binary (p->c1,c2) structures.
Thanks!
Hi Socher,
I have a question. Are the values in params.mat independent of vars.normalized.100.mat? When i tried to use your code( codeRAEVectorsNIPS2011) for computing compositional vectors i found out that many words are detected as *UNKNOWN*. Is there any way to get 50 or 200 dimension phrase vector instead 100?
Yes, we're only publishing the code for training the paraphrase classifier on top.
The other code is trained on a large corpus and hard to package up nicely.
In our most recent TACL paper we have a new RNN on dependency trees. I think it is even more powerful but haven't tried it on the paraphrasing task yet.
very elegant work! My confusion on variable matrix is almost tackled out by your paper.
Mike? — 06 February 2014, 06:57
Hi Richard, I too am interested in training an Unfolding RAE, but I am not sure what you meant by "You can use the training code from this website to see." in your last comment. There is no training code above. Are you saying you won't release it, and we should come up with our own training port to the pre-trained code above? Thanks, Mike.
Mike? — 06 February 2014, 06:57
Hi Richard, I too am interested in training an Unfolding RAE, but I am not sure what you meant by "You can use the training code from this website to see." in your last comment. There is no training code above. Are you saying you won't release it, and we should come up with our own training port to the pre-trained code above? Thanks, Mike.
Hi,
It takes at most 5h to train. You can use the training code from this website to see.
Best,
Richard
Bhanu? — 23 December 2013, 11:02
Hi Socher,
I am implementing this model as part of my thesis work. Could you please tell me how much time did it take to train unfolded RAE?
Thanks in advance.
Hi Richard, recently I have been studying your code above. However, I can't find any details about the RAE training algorithm in your code. Can I get your RAE training algorithm implementation for academic study purpose. Thx.
Hi Richard, recently I have been studying your code above. However, I can't find any details about the RAE training algorithm in your code. Can I get your RAE training algorithm implementation for academic study purpose. Thx.
Hi Tian,
It highly depends on your corpus. Are there many different words that are unknown in this vocabulary? Are the new pairs much longer or shorter?
For shorter phrases you may want to use a smaller similarity matrix.
Best,
Richard
Tian? — 08 August 2013, 12:57
I follow REAME.txt to replace the input.txt with new corpus and the performance is much lower than results in your paper, so I guess that we need to retrain the model with the new corpus,
How to make training, could you help me,
tian? — 08 August 2013, 12:35
Hi Richard
I try to use your data and code , it is OK for your input.txt,
And I get some input sentences , and I want to test the performance,
I read your paper, and find that you train the model with a part of MS corpus and then test the rest part of MS corpus,is it right,
As I want to test new input, I need to train my corpus, is it right,
So how to use your code for training model,
Could you help me