Paraphrase detection is the task of examining two sentences and determining
whether they have the same meaning. In order to obtain high accuracy on this
task, thorough syntactic and semantic analysis of the two statements is needed.
We introduce a method for paraphrase detection based on recursive autoencoders
(RAE). Our unsupervised RAEs are based on a novel unfolding objective and learn
feature vectors for phrases in syntactic trees. These features are used to measure
the word- and phrase-wise similarity between two sentences. Since sentences may
be of arbitrary length, the resulting matrix of similarity measures is of variable
size. We introduce a novel dynamic pooling layer which computes a fixed-sized
representation from the variable-sized matrices. The pooled representation is then
used as input to a classifier. Our method outperforms other state-of-the-art approaches
on the challenging MSRP paraphrase corpus.
An overview of our paraphrase model. The recursive autoencoder learns phrase features for each node in a parse tree. The distances between all nodes then fill a similarity matrix whose size depends on the length of the sentences. Using a novel dynamic pooling layer we can compare the variable-sized sentences and classify pairs as being paraphrases or not.
For remarks, criticism or other thoughts on the paper. Save what you write before you post, then type in the password, post (nothing happens), then copy the text and re-post. It's the only way to prevent spammers.