Parsing Natural Scenes and Natural Language with Recursive Neural Networks

Recursive structure is commonly found in the inputs of different modalities such as natural scene images or natural language sentences. Discovering this recursive structure helps us to not only identify the units that an image or sentence contains but also how they interact to form a whole. We introduce a max-margin structure prediction architecture based on recursive neural networks that can successfully recover such structure both in complex scene images as well as sentences. The same algorithm can be used both to provide a competitive syntactic parser for natural language sentences from the Penn Treebank and to outperform alternative approaches for semantic scene segmentation, annotation and classification. For segmentation and annotation our algorithm obtains a new level of state-of-the-art performance on the Stanford background dataset (78.1%). The features from the image parse tree outperform Gist descriptors for scene classification by 4%.

ICML Distinguished Application Paper Award

Video of Talk

Download Paper

Download Vision Code

Download Language Parser

Download Data Set




For remarks, critical comments or other thoughts on the paper. Save what you write before you post, then type in the password, post (nothing happens), then copy the text and re-post. It's to prevent spammers.

Add Comment 
Sign as Author 
Enter code:

Dirrogate?06 October 2013, 07:02

I came across this on the Facebook group "Strong Artificial Intelligence"

Thank you for the source code and all the explanations. I'd like to reference this page on the website of the hard science novel 'Memories with Maya' You'll understand my enthusiasm, because of the creative spin on AI done, in that story...

“I granted the AI access to the cameras on your Wizer,” he said. “Remember when I told you about frames and how the AI could take snapshots of your environment, then run image and feature recognition?” “Yeah…” was all I could manage. “That's what It just did. The AI takes a snapshot every few seconds when you're wearing the Wizer and creates a frame. The name of the bridge on the sign above is one of the prominent symbols to be recognized.”

“The AI also gets help from GPS data. It then cross references the current frame with its database of memories… stored frames, and if it finds a match, [It] tells you.

Memories, that's all they are, but if they begin to fade, [It'll] remind you,” he said.