OnQ Blog

Qualcomm Research excels in image recognition competition

Dec 14, 2015

Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.

Amidst fierce competition from 70 international teams from academia and industry, including Google, Microsoft, Tencent and the Korea Advanced Institute of Science and Technology, Qualcomm Research has been a consistent top-3 performer in the 2015 ImageNet challenges for object localization, object detection and scene classification.

ImageNet is the premier benchmark to measure progress in deep learning for computer vision organized annually by computer scientists at Stanford University, the University of Michigan and the University of North Carolina at Chapel Hill. It requires that computer systems created by the participating teams automatically recognize what objects or scenes appear in millions of digital photographs.

Deep Learning

Since a deep learning team led by Geoffrey Hinton from University of Toronto caused a smashing victory in the ImageNet competition in 2012, state-of-the-art object and scene recognition technology has been based on deep convolutional neural networks. These deep nets learn the image representation and classification results simultaneously by back-propagating information, through stacked convolution and pooling layers, loosely inspired by current hypotheses on how the human brain sees. While the principle of convolutional neural networks has been around for a long time, learning its many parameters has become viable thanks to breakthroughs in parallel computing using graphics processing units and databases containing a large number of labeled examples, such as the 14 million in the ImageNet collection.

Like others, the methodology used by Qualcomm Research (the R&D division of Qualcomm Technologies, Inc.) starts with very deep convolutional neural networks, parallel computation and large amounts of training data. However, the team innovates on its approach in how the deep learning system learns the correct parameters to detect and classify objects in images. Different from traditional deep learning solutions, which all learn the network parameters using an entire image as its input, our system is object centric. This means it is deriving the parameters of the neural network using only those regions in the image that are relevant for the recognition, while irrelevant pixels in the background are discarded. By doing so, the image representations are better suited to recognize and localize objects, while it also provides better initializations for networks specializing in recognizing scenes. Object-centric training forms the foundation for our entries in the object localization, object detection and scene classification challenges.

Objects Localization

The goal in the object localization challenge is to recognize what object is localized where in an image. A total of 1,000 objects are predefined, including bagels, sombreros and traffic lights. For each image, known to contain one object, a system may make five guesses on what object appears in the image and also predict a tight bounding box for each of them. If the predicted object category resembles the object of interest and the accompanying box has a sufficient overlap with a manually created ground truth, a prediction is considered correct. Last year’s winner of the task, University of Oxford, predicted the object and its location correct in 74.7% of the 100,000 test images, Qualcomm Research top-3 entry improved it further to 87.4%, where the winner scored 91%.

Example of object localization.

Object Detection

Object detection is similar in spirit to the object localization task. Again the task is to classify what object is located where in the image. However, different from the localization task that strives to find a single object, here the task is to localize every object in the image. The object categories are taken from a list of 200, including jellyfish, microwave, and wine bottle. The quality of labeling is evaluated by balancing the number of accurately detected objects and the number of false positives. Last year’s winner of the task, Google, obtained a score of 43.9%, Qualcomm Research’s top-2 entry improved it further to 53.6%, where winner Microsoft Research Asia scored 62.1%.

Scene Classification

The goal in the Places2 challenge, a competition organized by researchers from MIT, is to recognize what scene is depicted in an image. A total of 401 scenes are predefined, including freeway, hardware shop, and science museum. Often, the differences between two scenes are subtle, such as between a home bedroom and a hotel bedroom, making the task challenging. Again, each system may predict up to five scene categories. The winning entry for this challenge scored an accuracy of 83.1%, while Qualcomm Research obtained a top-3 position with 82.4%.

Example of scene classification.

The work of Qualcomm Research will be presented by Daniel Fontijne and Koen van de Sande at the ImageNet workshop of the International Conference on Computer Vision on December 17 in Santiago, Chile.

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Cees Snoek

Principal Engineer

©2021 Qualcomm Technologies, Inc. and/or its affiliated companies.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes Qualcomm's licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm's engineering, research and development functions, and substantially all of its products and services businesses. Qualcomm products referenced on this page are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell any of the components or devices referenced herein.