Dec 3, 2021
Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.
The pace of machine learning advancement continues at an astounding rate, and it’s that time of year again for AI researchers and engineers to share new discoveries, collaborate, and push the industry forward at the Neural Information Processing Systems (NeurIPS) conference. Although NeurIPS is virtual again this year due to COVID-19, the AI community has continued to be resilient and produce novel work that will ultimately transform industries and enhance lives.
Since NeurIPS 2020, the connected intelligent edge continues to grow as devices and services powered by AI have become increasingly integrated in our daily lives to offer significant benefits like enhanced photography, personalized voice UI, and more efficient wireless communications, creating an ongoing demand for AI research and products. Whether you’re virtually attending this year’s NeurIPS conference or just curious about what Qualcomm AI Research has in store, read on to learn about our latest demos, papers, workshops, talks, and other AI highlights.
Our NeurIPS demos
To show that our AI research topics are more than just theory and applicable in the real world, we bring them to life through demonstrations. We had two out of the twenty accepted demos at the NeurIPS this year. See our demo presentations, Dec. 9th, 8:30 – 9:35 a.m. PT.
Real-time and accurate self-supervised monocular depth estimation on mobile devices demo
Depth plays a key role in understanding the 3D world and is of great importance to a wide variety of applications, such as self-driving cars, robotics, AR/VR, and image processing. However, traditional depth sensors can be challenging to deploy at scale for a variety of reasons, such as cost, extra hardware requirements, and power consumption. Self-supervision has emerged as a new paradigm for the task of estimating scene depth from a single image, or monocular depth estimation. Utilizing our novel X-Distill approach, our neural network learns to understand the semantics of the scene in addition to geometry, leading to significantly more accurate depth prediction. Utilizing our Distilling Optimal Neural Network Architectures (DONNA) research, we derive highly efficient backbones via neural architectural search (NAS). In addition, the trained network is quantized using Qualcomm Innovation Center’s open-sourced AI Model Efficiency Toolkit, or AIMET, enabling real-time depth estimation on a mobile device at higher frame rates. Our solution solves a longstanding problem in computer vision and will enhance a wide range of exciting applications impacting our lives.
Unsupervised indoor Wi-Fi positioning demo
Passive positioning using radio frequency, otherwise known as RF sensing, is useful for a variety of tasks such as intrusion detection, asset tracking, and smart energy usage. Unlike cameras, it
works in all lighting conditions and safeguards privacy since it does not capture visual data. Existing machine learning solutions are often demonstrated on small floor plans within controlled-lab environments and require labels for supervised learning. Our new machine learning-based methods work on large floor plans and only require weakly labeled training data long with a floor plan. Using three to four access points, our methods demonstrate about 1 meter accuracy in multi-room, multi-floor environments with a 150 to 400 square meter floor surface area. This demo shows what can be achieved with our two technologies, OT IsoMap and WiCluster, paving the way for passive positioning solutions that can be easily implemented in smart homes, company buildings, and cities.
Our NeurIPS papers, workshops, socials, talks, and poster presentations
At academic conferences like NeurIPS, novel papers are a primary way to contribute innovative and impactful AI research to the rest of the community. I’d like to highlight one accepted paper that is advancing machine learning fundamentals.
“Modality-Agnostic Topology Aware Localization” presents a data-driven approach for the indoor localization of an observer on a 2D topological map of the environment. Existing state-of-the-art techniques may yield accurate estimates only when they are tailor-made for a specific data modality, like a camera-based system, which prevents their applicability to broader domains. We developed a modality-agnostic framework and formulated the localization problem in the context of parametric manifold learning while leveraging optimal transportation. Our framework allows jointly learning a low-dimensional embedding as well as correspondence with a topological map. Our experimental results demonstrate decimeter-level accuracy for localization using different sensory inputs.
We hope that you can join us at other NeurIPS following activities:
- Modality-Agnostic Topology Aware Localization (poster session on Dec. 8th, 12:30 – 2:00 a.m. PT)
- Shine in your technical presentation social event on Dec. 8th, 3 - 5 a.m. PT. It will discuss how to make the most out of your research presentation by providing tips on the structure, storytelling, and audience engagement for machine learning engineers.
- Women in Machine Learning (WiML) mentorship roundtable with Qualcomm AI Research (Dec. 10th, 12 - 1:45 p.m. PT).
Additional updates on our latest AI Research
I’m also very excited to share that we are releasing additional datasets to the AI community and showcasing two other demos focused on source compression and federated learning.
Availability of popular computer vision datasets from TwentyBN
We are pleased to make available popular datasets originally from TwentyBN to the AI research community: Jester and Something-Something. The Jester dataset is a large collection of labelled videos that show humans performing pre-defined hand gestures in front of a laptop camera or webcam. It enables training robust machine learning models to recognize human hand gestures.
The Something-Something dataset is a large collection of labelled video clips that show humans performing pre-defined basic actions with everyday objects. It enables machine learning models to develop fine-grained understanding of basic actions that occur in the physical world.
The two datasets were created with the help of more than 1300 crowd actors, who record and upload videos for specific scenarios. Jester and Something-Something are freely available for research purposes.
Real-time neural video decoding on a mobile device demo
As a follow up to our demonstration of the world’s first software-based neural intra-frame video decoder running HD format in real time on a commercial smartphone, we are now demonstrating the world’s first neural inter-frame HD video decoder that runs at 30 frames per second on a mobile device. Inter-frame decoding is more complex but provides more bit-rate savings since it uses a reference frame, which is usually the previous decoded frame, to facilitate the coding process. We overcame the challenges of running an inter-frame neural decoder on a mobile device by introducing a novel efficient architecture, using AIMET for quantization-aware training, and running entropy encoding in parallel. Neural codecs offer a lot of potential to address the growing demand to compress data so stay tuned for more advancements coming from Qualcomm AI Research.
Federated learning framework on mobile devices demo
The need for personalized experiences powered by AI is ever-growing, but preserving privacy while learning from user data has been a challenge. Federated learning enables training powerful AI models while preserving privacy through on-device learning based on user data that never leaves the device – only the trained model parameters are aggregated in the cloud.
We have developed a federated learning framework that runs on mobile devices. It is a scalable and customizable software library that extends TensorFlow- and PyTorch-based training workflows seamlessly to Android, which is unlike existing federated learning libraries available for research. Our demo showcases the use of our federated learning framework with 5000 worker nodes to train a voice user verification model on commercial mobile devices that can discriminate the target user from an impostor. Stayed tuned for more advances in federated learning to train models at scale while preserving privacy.
We hope to (virtually) meet you at NeurIPS or future AI conferences and share our impact on AI. At Qualcomm Technologies, we make breakthroughs in fundamental research and scale them across devices and industries. Our AI solutions are powering the connected intelligent edge. Qualcomm AI Research works together with the rest of the company to integrate the latest AI developments and technology into our products — shortening the time between research in the lab and delivering advances in AI that enrich lives.