OnQ Blog

Exploring what’s possible when machine “sight” evolves

Qualcomm AI researchers receive prestigious ICLR award: "Spherical CNNs” wins 2018 Best Paper Award.

Advances in deep learning that enable machines to “see” as humans do is one of the most fascinating parts of artificial intelligence research. And now a breakthrough technology is enabling machines to see and recognize objects in three-dimensional space through a process called spherical Convolutional Neural Networks (CNNs). We’re just beginning to imagine what might be possible with this new-found ability, from machines that could identify tiny molecules to tools that could analyze the largest structures in outer space. And of course, many things in between, such as teaching robots to safely navigate among us.

“Spherical CNN” [the award-winning research paper by Qualcomm Technologies Netherlands researchers] introduces the mathematical framework for building CNNs that can analyze spherical images without being fooled by distortions.

This area of research is on the cutting edge of AI developments, as proven by a recent award. Qualcomm Technologies Netherlands researchers Taco Cohen and Max Welling, together with collaborators, received the International Conference for Learning Representations (ICLR) 2018 Best Paper Award for their paper “Spherical CNNs,” which was written in affiliation with the University of Amsterdam. ICLR, now in its sixth edition, publishes state-of-the-art research in artificial intelligence (AI) and machine learning. ICLR 2018 is chaired by Yoshua Bengio (University of Montreal) and Yann LeCun (NYU/Facebook). Out of approximately one thousand submissions from top AI labs around the world, the best paper award distinction is offered to the most innovative and impactful research in the field of AI.

The paper “Spherical CNN” introduces the mathematical framework for building CNNs that can analyze spherical images without being fooled by distortions. This is achieved by making the spherical CNN "equivariant" to rotations, meaning that the internal representations learned by the network will rotate together with the input. Empirically, Spherical CNNs achieve excellent prediction accuracy on two very different tasks: 3D model recognition from spherical images and atomization energy prediction (an important problem in chemistry).

Why Spherical CNNs are important

To understand the importance of Spherical CNNs, let us provide some background: Over the last few years, Deep Learning (DL) — CNNs in particular — have revolutionized AI, with breakthrough results being reported in speech recognition, visual object recognition, natural language processing, and other areas. CNNs are very good at analyzing linear signals such as audio or text, images, or video, because they have a built-in ability to recognize patterns regardless of their location in space or time. This allows them to learn to recognize, for instance, a visual object regardless of where it is in the image, without having to see many shifted copies of the same object during the training phase of the DL model. However, in many applications of recent interest, the signals that we wish to learn from live on a sphere. For example, omnidirectional cameras used by cars, drones, and other robots capture a spherical image of their entire surroundings. In scientific applications too, spherical signals abound, with examples ranging from earth science to astrophysics.

One way to analyze such spherical signals would be to project them to the plane, and analyze the result with a CNN. However, as we know from cartography, any such "map projection" results in distortions that make some areas look larger or smaller than they really are. This renders the CNN ineffective, because as objects move around the sphere, they don't just move around on the map but also appear to shrink and stretch.

How Spherical CNNs can be used

Spherical CNNs have many applications in IoT, robotics, autonomous cars, augmented reality (AR), and virtual reality (VR). Autonomous drones, which are already sold to consumers, and may one day deliver packages to your doorstep in minutes, are one natural application where Spherical CNNs could improve object detection and recognition, as well as visual motion analysis. In AR, a set of cameras might capture a full 360-degree view of a room, which can be fused into a single spherical image and analyzed by a Spherical CNN in an efficient manner to accurately overlay virtual objects.

At Qualcomm we’re very excited about these and other transformational applications made possible by this work and are actively furthering this as well as other data-efficient learning research.