Parsing Natural Scenes and Natural Language with Recursive Neural Networks
![]() |
Recursive structure is commonly found in the inputs of different modalities such as natural scene images or natural language sentences. Discovering this recursive structure helps us to not only identify the units that an image or sentence contains but also how they interact to form a whole. We introduce a max-margin structure prediction architecture based on recursive neural networks that can successfully recover such structure both in complex scene images as well as sentences. The same algorithm can be used both to provide a competitive syntactic parser for natural language sentences from the Penn Treebank and to outperform alternative approaches for semantic scene segmentation, annotation and classification. For segmentation and annotation our algorithm obtains a new level of state-of-the-art performance on the Stanford background dataset (78.1%). The features from the image parse tree outperform Gist descriptors for scene classification by 4%. |
Video of Talk
Download Paper
Download Vision Code
- Download (code only): codeSocherICML2011.zip (100 kB)
- See included README.txt for more information.
Download Data Set
- We use the Stanford Vision library for computing segment features for the Stanford background dataset.
- In the file below you can download all the pre-processed data which should work out-of-the-box with the code provided above. The file includes:
- original images
- superpixels/ oversegmentation
- features for each superpixel
- ground truth labels for pixels and superpixels
- superpixel adjacency matrix
- Download code and dataset: codeAndDataSocherICML2011.zip (748 MB !)
- See included README.txt for more information.
- We thank Stephen Gould and Tianshi Gao for letting us re-distribute their dataset and helping us with the feature computation.
Results
- Here are some scene image segmentation results (from the paper)
-



- Here are some more results (not in the paper)



Bibtex
- If you use the code, please cite @InProceedings{SocherEtAl2011:RNN,
author = {Richard Socher and Cliff C. Lin and Andrew Y. Ng and Christopher D. Manning},
title = {{Parsing Natural Scenes and Natural Language with Recursive Neural Networks}},
booktitle = {Proceedings of the 26th International Conference on Machine Learning (ICML)},
year = 2011
}
Comments
For remarks, critical comments or other thoughts on the paper.
