What’s coming next in AI: Here’s our latest published research at major conferences
Even though it’s only April, Qualcomm AI Research has already published valuable work in 2021. The papers accepted at CVPR and ICLR, reputable machine learning conferences, will be presented on virtual stages this year. Below, we highlight our research and provide a glimpse of the future of AI.
Multi-modal fusion
Significant progress has been made in the past few years to identify actions in videos. In addition to that, we’ve seen significant improvement in the field of video analytics with machine learning. AI can now quickly learn what is happening in videos even when they have very minimal labeling. We call this “weakly-labeled data.” For Qualcomm Technologies, video perception and model efficiency go hand in hand.
The paper “Cross-attentional audio-visual fusion for weakly-supervised action localization,” which was accepted (poster) at ICLR 2021, takes video understanding to a new level by exploring the audio modality as well. Oftentimes, actions in a video are a joint audio-visual representation. For example, AI can more accurately detect a person running in a video if it can also corroborate it with specific sounds, such as breathing and leg shuffling. In the weakly-supervised setting, this method performs better than all the existing methods in accuracy by at least 1.4%. In addition to this, our approach in the weakly-supervised setting achieves a comparable accuracy (77.1%) to the fully supervised state-of-the-art method.
Video efficiency
At CVPR 2021, two papers on video understanding have been accepted, namely “FrameExit: Conditional Early Exiting for Efficient Video Recognition” (oral) and “Skip-Convolutions for Efficient Video Processing” (poster). Video streams contain many redundancies — in other words, repeated information that is not necessary to process to achieve the same results. Convolutional neural networks process sequences frame by frame, layer by layer. Recalculating this redundant information is extremely compute-inefficient. Skip convolutions are a way to save computation and make sure that the neural network focuses only on significant changes in the frame. For example, if the AI model is focused on tracking the movement of a car, it would skip the frames in which the car stands still. Skip convolutions achieve a significant reduction in computation of 300% to 400%.
The FrameExit framework adjusts the amount of computation based on the difficulty of the input, which can significantly reduce the computational requirements. More specifically, FrameExit is an efficient video recognition model that performs automatic early exiting by adjusting the computational budget on a per-video basis. The researchers found that a few frames are sufficient for classifying most sample videos, while more difficult sample videos need more frames for detailed information. Using FrameExit, the model’s performance for an object classification task yields 2.5x less MACs while maintaining accuracy. For a video classification task, FrameExit uses 1.3x to 5x less GFLOPs while maintaining accuracy.
Video Compression
Low-power edge devices, such as smartphones and the ones in smart homes, are resource constrained. Managing to decrease the size of the models is crucial in deploying them on devices. The paper “Overfitting for fun and profit: Instance-adaptive data compression” was accepted at ICLR 2021 (poster) and presents a method called instance-adaptive neural compression.
It is the first method that enables finetuning of a full compression model on a single video, while restricting the additional bits for encoding the (quantized) model updates. The method showed performance improvement on all five tested videos in an I-frame compression setup, with an average distortion improvement of about 1dB for the same bitrate. Moreover, the approach improves on the state-of-the-art performance of well-known frameworks such as Scale Space Flow. This framework can be applied to any compression model or task.
We are also co-organizing the Neural Compression Workshop at ICLR 2021 on the 8th of May along with Disney and UC Irvine on the topic of generative models. Be sure to check it out if you’re participating.
Computer vision
Another paper accepted at CVPR 2021 is "InverseForm: A Loss Function for Structured Boundary-Aware Segmentation" (oral). The research introduces a framework for creating better feature maps for semantic segmentation tasks. This framework allows for capturing boundary transformations with consistent and significant performance improvement on segmentation backbone models. This is all achieved without increasing their size and computational complexity. All models trained using InverseForm loss consistently improve compared to their baselines and produce scores on par with state-of-the-art results.
Advanced CNNs
Expanding on our research to find more generalized CNNs that are rotationally equivariant, we pushed gauge equivariant CNNs even further. The new paper, "Gauge Equivariant Mesh CNNs: Anisotropic Convolutions on Geometric Graphs," which was accepted at ICLR 2021 (oral), advances CNNs with equivariance and probabilistic numerics, introducing mesh convolutional neural networks that can be applied to 3D imagery for tasks such as detecting shapes, registering different poses of the same shape, and shape segmentation. This is a groundbreaking field of research, so we’re looking forward to seeing this work changing our AR, VR, automotive, and drone experiences in the years to come.
The ICLR 2021 paper (poster), "Probabilistic Numeric Convolutional Neural Networks," Probabilistic Numeric Convolutional Neural Networksproposes another advanced type of CNN. This CNN borrows concepts from probabilistic numerics theory and represents features as Gaussian processes, providing a probabilistic description of discretization error. Their solution shows a 3x reduction error on computer vision tasks from the previous state-of-the-art results on a benchmark dataset.
We’re excited to see the work of Qualcomm AI Research being recognized in the scientific community. If you’re planning to attend ICLR or CVPR this year, meet us at the virtual booth to speak to our engineers and recruiters, and follow the main conference schedule to find out more about our accepted papers and demos.


