Dynamic Pooling And Unfolding Recursive Autoencoders For Paraphrase Detection

Paraphrase detection is the task of examining two sentences and determining whether they have the same meaning. In order to obtain high accuracy on this task, thorough syntactic and semantic analysis of the two statements is needed. We introduce a method for paraphrase detection based on recursive autoencoders (RAE). Our unsupervised RAEs are based on a novel unfolding objective and learn feature vectors for phrases in syntactic trees. These features are used to measure the word- and phrase-wise similarity between two sentences. Since sentences may be of arbitrary length, the resulting matrix of similarity measures is of variable size. We introduce a novel dynamic pooling layer which computes a fixed-sized representation from the variable-sized matrices. The pooled representation is then used as input to a classifier. Our method outperforms other state-of-the-art approaches on the challenging MSRP paraphrase corpus.


An overview of our paraphrase model. The recursive autoencoder learns phrase features for each node in a parse tree. The distances between all nodes then fill a similarity matrix whose size depends on the length of the sentences. Using a novel dynamic pooling layer we can compare the variable-sized sentences and classify pairs as being paraphrases or not.

Download Paper

Download Code

Full Paraphrase System

Computing Compositional Vectors

Updated Related Work

Bibtex

Comments

For remarks, criticism or other thoughts on the paper. Save what you write before you post, then type in the password, post (nothing happens), then copy the text and re-post. It's the only way to prevent spammers.

Add Comment 
Sign as Author 
Enter code:

RichardSocher24 July 2014, 09:25

@D Mr?, There is a flag you can set to binarize trees, you can find it in the code of our sentiment model from EMNLP 2013 and the newest Core NLP? package.

RichardSocher24 July 2014, 09:24

@sriram bhargav Yes, if you retrain the model you can use any set of word vectors you want, e.g. from our ACL 2012 paper, or a new set from Pennington et al 2014 (see front page for link in a few days) or word2vec. You can probably get an improvement in performance by using a larger vocabulary. Best, Richard

DMr?04 July 2014, 00:21

To clarify, in the following tree, there are two NP nodes with 3 child leaves:

(ROOT

  (S
    (NP (NNP Shisheh))
    (VP (VBZ is)
      (NP
        (NP (DT a) (NN village))
        (PP (IN in)
          (NP (NNP Hemmatabad) (NNP Rural) (NNP District))))
      (, ,)
      (PP (IN in)
        (NP
          (NP (DT the) (NNP Central) (NNP District))
          (PP (IN of)
            (NP
              (NP (NNP Borujerd) (NNP County))
              (, ,)
              (NP
                (NP (NNP Lorestan) (NNP Province))
                (, ,)
                (NP (NNP Iran))))))))
    (. .)))

Would you just qualify those children as named entities and average the leaves on the same level?

How would you have determined to concatenate the child vectors?

DMr?03 July 2014, 23:50

Hi Richard,

  Thanks for all the research and code, it's a great help to all the lowly dataminers out there.   I'm wondering how you opted to construct the binary parse trees for a given sentence during unsupervised training.  Unless I'm missing something, the Stanford Parser won't restrict itself to binary (p->c1,c2) structures.   

Thanks!

sriram bhargav?29 May 2014, 08:07

Hi Socher, I have a question. Are the values in params.mat independent of vars.normalized.100.mat? When i tried to use your code( codeRAEVectorsNIPS2011) for computing compositional vectors i found out that many words are detected as *UNKNOWN*. Is there any way to get 50 or 200 dimension phrase vector instead 100?

RichardSocher13 April 2014, 23:51

Yes, we're only publishing the code for training the paraphrase classifier on top. The other code is trained on a large corpus and hard to package up nicely. In our most recent TACL paper we have a new RNN on dependency trees. I think it is even more powerful but haven't tried it on the paraphrasing task yet.

Yin Hang?12 March 2014, 03:02

very elegant work! My confusion on variable matrix is almost tackled out by your paper.

Mike?06 February 2014, 06:57

Hi Richard, I too am interested in training an Unfolding RAE, but I am not sure what you meant by "You can use the training code from this website to see." in your last comment. There is no training code above. Are you saying you won't release it, and we should come up with our own training port to the pre-trained code above? Thanks, Mike.

Mike?06 February 2014, 06:57

Hi Richard, I too am interested in training an Unfolding RAE, but I am not sure what you meant by "You can use the training code from this website to see." in your last comment. There is no training code above. Are you saying you won't release it, and we should come up with our own training port to the pre-trained code above? Thanks, Mike.

RichardSocher29 December 2013, 10:51

Hi, It takes at most 5h to train. You can use the training code from this website to see. Best, Richard

Bhanu?23 December 2013, 11:02

Hi Socher,

I am implementing this model as part of my thesis work. Could you please tell me how much time did it take to train unfolded RAE?

Thanks in advance.

Wilsoncao?06 October 2013, 10:14

Hi Richard, recently I have been studying your code above. However, I can't find any details about the RAE training algorithm in your code. Can I get your RAE training algorithm implementation for academic study purpose. Thx.

Wilsoncao?06 October 2013, 10:14

Hi Richard, recently I have been studying your code above. However, I can't find any details about the RAE training algorithm in your code. Can I get your RAE training algorithm implementation for academic study purpose. Thx.

RichardSocher20 August 2013, 05:32

Hi Tian, It highly depends on your corpus. Are there many different words that are unknown in this vocabulary? Are the new pairs much longer or shorter? For shorter phrases you may want to use a smaller similarity matrix. Best, Richard

Tian?08 August 2013, 12:57

I follow REAME.txt to replace the input.txt with new corpus and the performance is much lower than results in your paper, so I guess that we need to retrain the model with the new corpus, How to make training, could you help me,

tian?08 August 2013, 12:35

Hi Richard I try to use your data and code , it is OK for your input.txt, And I get some input sentences , and I want to test the performance, I read your paper, and find that you train the model with a part of MS corpus and then test the rest part of MS corpus,is it right, As I want to test new input, I need to train my corpus, is it right, So how to use your code for training model, Could you help me