OnQ Blog

Sharing our latest innovations in AI

A look at Qualcomm AI Research’s latest contributions to major conferences

Written by

Dr. Joseph Soriaga

Feb 7, 2022

Qualcomm AI Research continues to make valuable contributions to the ML community by publishing insights that enable more energy-efficient AI and drive the evolution toward the connected intelligent edge. In this blog post, I’ll highlight some of our accepted papers that were presented across key machine learning conferences in the latter part of 2021—such as the International Conference on Machine Learning (ICML) and the International Conference on Computer Vision (ICCV)—and provide a peek into the future of AI.

Power efficiency

Model efficiency is essential for on-device AI and making AI ubiquitous. The paper, “Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces”, which was accepted at ICCV, presents a novel pipeline for rapid neural architecture search (NAS) and search space exploration, targeting multiple different hardware platforms and user scenarios. With this approach, DONNA finds architectures that outperform the state-of-the-art by 20-40%. Later at NeurIPS, we used DONNA to accelerate real-time monocular depth estimation models on commercial devices, increasing the frame rate by 40%.

Transformer-based architectures have become the de-facto models for a wide range of language and visual tasks. However, their memory footprint and high latency are prohibitive for efficient deployment and inference on resource-limited devices. The paper, “Understanding and Overcoming the Challenges of Efficient Transformer Quantization,” which was accepted at Empirical Methods in Natural Language Processing (EMNLP), explores quantization for transformers. We demonstrate the effectiveness of our methods on the GLUE benchmark using BERT and establish state-of-the-art results for both post-training quantization and quantization-aware training. In addition, we show that transformer weights and embeddings can be quantized to ultra-low bit widths, leading to significant memory savings with minimum accuracy loss.

Computer Vision

As transformers extend their applicability into vision use cases, power and compute efficiency become particularly important consideration for video applications. At the NeurIPS Workshop on Machine Learning for Autonomous Driving, we presented our work on “Spatio-Temporal Gated Transformers for Efficient Video Processing,” which addresses the large amount of redundant computations that can arise in video segments where there are small changes between frames. An analysis of the computation cost of the video object detection framework DETR identifies the linear layers as the major computation bottleneck. Thus, we propose dynamic gating layers to conduct conditional computation and validate this on a video object detection use case where FLOPs are reduced by 48.3% without a significant drop of accuracy.

Along with video, focus on 3D has been growing rapidly with the evolution of technologies in metaverse, autonomous driving, and mobile camera. At BMVC, our paper “X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation” establishes a new state-of-art in performance for monocular depth estimation through better training. More specifically, during training, we utilize a pre-trained semantic segmentation teacher network and transfer its semantic knowledge to the depth network. To enable such knowledge distillation across two different visual tasks, we introduce a small, trainable network that translates the predicted depth map to a semantic segmentation map, which can then be supervised by the teacher network. This was further shown on commercial devices at NeurIPS as an accepted demo.

Data compression and generative modeling

Improved video compression is needed now more than ever with the continued growth in video and its critical role in remote work culture. The paper, “Extending Neural P-frame Codecs for B-frame Coding,” which was presented at ICCV, builds upon our leading neural video codec work and demonstrations. While most neural video codecs address P-frame coding (predicting each frame from past ones), this paper addresses B-frame compression (predicting frames using both past and future reference frames). Using our method can lead to 28.5% saving in bitrate compared to the P-frame codec while generating the same video quality.

The paper, “Progressive Neural Image Compression with Nested Quantization and Latent Ordering,” which was accepted at International Conference on Image Processing (ICIP), pushes the boundary of variable bitrate compression by allowing quality-scalable coding in a single bitstream. In contrast to existing learned variable bitrate solutions, which produce separate bitstreams for each quality, our method enables easier rate control and requires less storage. Our method is the first learning-based progressive image coding scheme and outperforms SPIHT, a well-known wavelet-based progressive image codec.

Personalization and federated learning

The need for intelligent, personalized experiences is ever-growing, and on-device learning is the technology that can make it possible while preserving data privacy. The paper, “Federated Learning of User Verification Models Without Sharing Embeddings,” which was accepted at ICML, considers the problem of training user verification (UV) models in federated setting, where each user has access to its own data, and user embeddings cannot be shared with the server or other users. Our experimental results, applied to user verification tasks across voice, face, and handwriting, show that our method is on par with existing approaches, while not sharing the embeddings with other users or the server. In addition, you can find this leading research being implemented in our federated learning framework demo of voice user verification.

Self-supervision has become a powerful technique to train neural networks without needing accurately labeled data. Oftentimes, imitated classification objectives, or pretext tasks, are established by assigning labels to transformed or augmented input images. The paper, “Distribution Estimation to Automate Transformation Policies for Self-Supervision,” which was accepted at the NeurIPS workshop on Self-Supervised Learning, proposes a framework based on generative adversarial network to automatically find the transformations which are not present in the input dataset and thus effective for the self-supervised learning. We evaluated our framework using several visual recognition datasets to show the efficacy of our automated transformation policy.

Wireless and RF Sensing

Applying deep learning to wireless can enable better communications. Our paper, “Neural Augmentation of Kalman Filter with Hypernetwork for Channel Tracking,” which was accepted at Globecom, proposes Hypernetwork Kalman Filter (HKF) for estimating wireless channels in communications systems, and it can also be used to improve tracking of dynamical systems. The HKF combines the generalization power of Kalman filters with expressive power of neural networks. The HKF was shown to outperform both parametrized Kalman filters and long short-term memory (LSTM) models trained across a wide range of Doppler values. Moreover, the HKF generalizes well to unseen Doppler, signal-to-noise ratio values, and pilot patterns much better than LSTM models, which suffer from severe performance degradation.

We also show how deep learning enables new capabilities like radio frequency (RF) sensing. Our paper, “WiCluster: Passive Indoor 2D/3D Positioning using Wi-Fi without Precise Labels,” which was also accepted at Globecom, introduces a new machine learning approach for passive indoor positioning using RF channel state information (CSI). Unlike other approaches, WiCluster can predict both a zone-level position and a precise 2D or 3D position, without using any precise position labels during training. Reducing reliance on accurately labeled data is critical for the adoption of such technology. The paper includes results and a robustness study for 2D positioning in two single-floor office buildings and 3D positioning in a two-story home. In addition, you can find this leading research being implemented in our accepted NeurIPS demo.

We’re excited to see the work of Qualcomm AI Research being recognized in the scientific community.

For additional leading AI papers across these and other categories, visit Qualcomm AI Research.

Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

About the Author

Dr. Joseph SoriagaSenior Director of Technology, Qualcomm Technologies