OnQ Blog

What’s coming next in AI: Here’s our latest published research at major conferences

Apr 29, 2021

Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.

Even though it’s only April, Qualcomm AI Research has already published valuable work in 2021. The papers accepted at CVPR and ICLR, reputable machine learning conferences, will be presented on virtual stages this year. Below, we highlight our research and provide a glimpse of the future of AI.

Multi-modal fusion

Significant progress has been made in the past few years to identify actions in videos. In addition to that, we’ve seen significant improvement in the field of video analytics with machine learning. AI can now quickly learn what is happening in videos even when they have very minimal labeling. We call this “weakly-labeled data.” For Qualcomm Technologies, video perception and model efficiency go hand in hand.

The paper “Cross-attentional audio-visual fusion for weakly-supervised action localization,” which was accepted (poster) at ICLR 2021, takes video understanding to a new level by exploring the audio modality as well. Oftentimes, actions in a video are a joint audio-visual representation. For example, AI can more accurately detect a person running in a video if it can also corroborate it with specific sounds, such as breathing and leg shuffling. In the weakly-supervised setting, this method performs better than all the existing methods in accuracy by at least 1.4%. In addition to this, our approach in the weakly-supervised setting achieves a comparable accuracy (77.1%) to the fully supervised state-of-the-art method.

Video efficiency

At CVPR 2021, two papers on video understanding have been accepted, namely “FrameExit: Conditional Early Exiting for Efficient Video Recognition” (oral) and “Skip-Convolutions for Efficient Video Processing” (poster). Video streams contain many redundancies — in other words, repeated information that is not necessary to process to achieve the same results. Convolutional neural networks process sequences frame by frame, layer by layer. Recalculating this redundant information is extremely compute-inefficient. Skip convolutions are a way to save computation and make sure that the neural network focuses only on significant changes in the frame. For example, if the AI model is focused on tracking the movement of a car, it would skip the frames in which the car stands still. Skip convolutions achieve a significant reduction in computation of 300% to 400%.

Skip convolutions ignore unnecessary processing in subsequent video frames, saving significant GMACs.
Skip convolutions ignore unnecessary processing in subsequent video frames, saving significant GMACs.

The FrameExit framework adjusts the amount of computation based on the difficulty of the input, which can significantly reduce the computational requirements. More specifically, FrameExit is an efficient video recognition model that performs automatic early exiting by adjusting the computational budget on a per-video basis. The researchers found that a few frames are sufficient for classifying most sample videos, while more difficult sample videos need more frames for detailed information. Using FrameExit, the model’s performance for an object classification task yields 2.5x less MACs while maintaining accuracy. For a video classification task, FrameExit uses 1.3x to 5x less GFLOPs while maintaining accuracy.

FrameExit improves the accuracy of the model while reducing the total computation.
FrameExit improves the accuracy of the model while reducing the total computation.

Video Compression

Low-power edge devices, such as smartphones and the ones in smart homes, are resource constrained. Managing to decrease the size of the models is crucial in deploying them on devices. The paper “Overfitting for fun and profit: Instance-adaptive data compression” was accepted at ICLR 2021 (poster) and presents a method called instance-adaptive neural compression.

It is the first method that enables finetuning of a full compression model on a single video, while restricting the additional bits for encoding the (quantized) model updates. The method showed performance improvement on all five tested videos in an I-frame compression setup, with an average distortion improvement of about 1dB for the same bitrate. Moreover, the approach improves on the state-of-the-art performance of well-known frameworks such as Scale Space Flow. This framework can be applied to any compression model or task.

We are also co-organizing the Neural Compression Workshop at ICLR 2021 on the 8th of May along with Disney and UC Irvine on the topic of generative models. Be sure to check it out if you’re participating.

Computer vision

Another paper accepted at CVPR 2021 is "InverseForm: A Loss Function for Structured Boundary-Aware Segmentation" (oral). The research introduces a framework for creating better feature maps for semantic segmentation tasks. This framework allows for capturing boundary transformations with consistent and significant performance improvement on segmentation backbone models. This is all achieved without increasing their size and computational complexity. All models trained using InverseForm loss consistently improve compared to their baselines and produce scores on par with state-of-the-art results.

InverseForm applied for semantic segmentation, accurately classifying and precisely segmenting objects. Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/.
InverseForm applied for semantic segmentation, accurately classifying and precisely segmenting objects. Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/.

Advanced CNNs

Expanding on our research to find more generalized CNNs that are rotationally equivariant, we pushed  gauge equivariant CNNs even further. The new paper, "Gauge Equivariant Mesh CNNs: Anisotropic Convolutions on Geometric Graphs," which was accepted at ICLR 2021 (oral), advances CNNs with equivariance and probabilistic numerics, introducing mesh convolutional neural networks that can be applied to 3D imagery for tasks such as detecting shapes, registering different poses of the same shape, and shape segmentation. This is a groundbreaking field of research, so we’re looking forward to seeing this work changing our AR, VR, automotive, and drone experiences in the years to come.

The ICLR 2021 paper (poster), "Probabilistic Numeric Convolutional Neural Networks," Probabilistic Numeric Convolutional Neural Networksproposes another advanced type of CNN. This CNN borrows concepts from probabilistic numerics theory and represents features as Gaussian processes, providing a probabilistic description of discretization error. Their solution shows a 3x reduction error on computer vision tasks from the previous state-of-the-art results on a benchmark dataset.

We’re excited to see the work of Qualcomm AI Research being recognized in the scientific community. If you’re planning to attend ICLR or CVPR this year, meet us at the virtual booth to speak to our engineers and recruiters, and follow the main conference schedule to find out more about our accepted papers and demos.

 

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Dr. Max Welling

Vice President, Technology

Armina Stepan

Sr. Marketing Comms Coordinator, Qualcomm Technologies Netherlands B.V.

©2021 Qualcomm Technologies, Inc. and/or its affiliated companies.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes Qualcomm's licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm's engineering, research and development functions, and substantially all of its products and services businesses. Qualcomm products referenced on this page are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell any of the components or devices referenced herein.