OnQ Blog

Helping machines see and understand like humans [video]

At Qualcomm AI Research, we’ve been on a mission to help machines see and understand like humans. Computer vision using deep Convolutional Neural Networks (CNN) has created breakthrough results for 2D image classification as compared to previous methods, but it has limitations. Specifically, traditional CNNs are not equivariant, meaning that they do not generalize well when the input image is subjected to a symmetry transformation such as rotation. For example, a camera on a drone using a traditional CNN may not identify an object when the drone rolls, since the object now is viewed from a new angle. They are also unable to effectively analyze images on curved surfaces or geometries, such as spherical or fisheye lens images.

Our journey to more generalized CNNs

Over the last year or so at Qualcomm Technologies Netherlands, we’ve continued to build on our earlier work in the field and introduced progressively more general versions of the traditional CNN to address the issues discussed above. In 2016, we introduced Group Equivariant CNNs, which generalized CNNs to exploit larger groups of symmetries, including rotations and reflections. This method works by sharing learnable weights not just among shifted filters as in a CNN, but also among rotated and flipped copies of each filter. As a result, Group CNNs can learn from fewer examples and generalize perfectly to transformed images. As follow-up work in 2017, we introduced a general theory for Steerable CNNs, which built upon group equivariant CNNs by showing how the computational cost can be decoupled from the size of the symmetry group, thus paving the way for future scaling.

In 2018, we introduced Spherical CNNs, which started to address geometry and provided the mathematical framework for building CNNs that can analyze spherical images without being fooled by distortions. This work received the best paper award at ICLR in that year. Also in 2018 as follow-up work to steerable CNNs, our AI researchers co-authored the 3D Steerable CNNs paper. The big deal about 3D Steerable CNNs is that they are equivariant to 3D object rotations (e.g., a chair is still identified as a chair regardless of its orientation) and are data efficient (several times less data is needed to train the AI model).

Gauge Equivariant CNNs

Yet, there was still more work to be done to generalize CNNs, for example enabling CNNs to analyze images on any surface, such as an image obtained by fusing planar images from multiple cameras on a self-driving car. To make CNNs able to analyze images on curved spaces, we discovered that we need to use the mathematics underlying general relativity and quantum field theory in deep learning. We were able to connect the dots from Max’s Ph.D. thesis in quantum gravity to deep learning to create Gauge Equivariant CNNs. Watch the animated video below for an explanation of Gauge Equivariant CNNs in Max’s own words.

Gauge equivariant CNNs are the most general CNNs we’ve created and unify two lines of AI research: equivariant and geometric deep learning. Gauge equivariant CNNs can analyze image data on any curved space or geometry, from flat to spherical, and can generalize deep learning to handle various symmetries in a principled manner.

Gauge equivariant CNNs are based on and inspired by gauge theory, a foundational framework of modern physics (e.g., quantum field theory and general relativity). Gauge equivariant CNNs are beneficial for several reasons:

  • They are the only principled solution that works for curved image data.
  • They offer robust performance and faster training while requiring fewer training examples. We are able to implement the gauge equivariant convolution in a way that is highly scalable and practical.
  • They address issues for a broad range of visual recognition use cases, from drones and robots to XR and autonomous driving.

One result from our paper “Gauge Equivariant Convolutional Networks and the Icosahedral CNN, to be presented at the International Conference on Machine Learning (ICML) 2019, is state-of-the-art accuracy on climate pattern segmentation. We are very optimistic that these more generalized CNNs will find many diverse applications.

And our research on making CNNs more generalized is not over as we are currently exploring additional topics in this area. At Qualcomm AI Research, our goal is to make fundamental AI research breakthroughs so that we as well as our customers can scale the technology across industries and ultimately enrich our daily lives. In an upcoming webinar and presentation, we’ll explain how we invest in leading research and development across the broad spectrum of AI technologies. If you are excited about solving big problems with cutting-edge AI research and improving the lives of billions of people, we’d like to hear from you. We’re recruiting for several machine learning openings.


Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc. Full references to all the generalized CNN papers discussed in this post can be found at http://ta.co.nl/



Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.