OnQ Blog

Teaching devices to see in 3D through data-efficient AI

Qualcomm AI Research’s “3D Steerable CNNs” paper details how devices could learn to see quickly and accurately.

When humans first see a chair and learn its properties, they can still understand that it is a chair when it’s flipped over. We make that inference rather quickly, in fact. But what we learned as toddlers is still difficult for a machine to master. AI uses convolutional neural networks (CNNs), which are loosely inspired by the human brain, to see. But this method isn’t particularly robust or efficient.

Traditional CNNs have trouble generalizing a solution for identifying the same object in different 3D orientations. CNNs work by generating feature maps that detect certain features of an object. For instance, consider the legs of chair. These feature map detectors (see GIF below) behave inconsistently when an image is rotated, resulting in decreased detection accuracy. We could use a lot of 2D images to train an AI model to recognize the chair from all different perspectives, but it would still fail quite often and not be consistent in its assessments.

The capture of 3D images is becoming increasingly commonplace across devices — think augmented reality (AR), mobile phones, cars, medical devices, robots, and more. What if we could create a sophisticated CNN to process 3D images and address the limitations of a traditional CNN?

Enter “3D Steerable CNNs,” a technology developed by Qualcomm AI Research. These CNNs have special skills and are designed to offer significant benefits such as:

  • Equivariance to rotations: This means that no matter how you rotate the object in 3D, it will still be analyzed in the same way and identified for what it is — in our example, a chair would be a chair regardless of its orientation. (see GIF below)
  • Data-efficiency: This means that several times less data is needed to train the AI model than in the case of a traditional CNN. For example, with just a few 3D images of a chair, 3D Steerable CNNs could learn to recognize other chairs in any orientation. That data efficiency is a direct result of the equivariance property.

You’re probably wondering: why do we care about chairs so much? Well, we don’t in particular. 3D Steerable CNNs can be applied to many kinds of 3D data. For example, our “3D Steerable CNNs” paper shows that 3D Steerable CNNs are very good at analyzing the 3D structure of proteins, which has important applications in drug development and bioinformatics. You could also easily imagine the applications of 3D Steerable CNNs in other industries such as AR, where the real objects in a space could be more easily detected and integrated into the AR experience, even when the AR device sees them from one side only.

At Qualcomm AI Research, we’re making fundamental AI research breakthroughs such as 3D Steerable CNNs so that we as well as our partners and customers can scale the technology across industries and ultimately enrich our daily lives. Our “3D Steerable CNNs” paper will be presented at NeurIPS (formerly NIPS), a premier machine learning conference to be held December 2-8 in Montreal.


Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.


Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.