Jun 9, 2020
Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.
Qualcomm AI Research is to guide leading researchers across the globe, from the QUVA lab and MICS Center to Duke University and MIT’s Microsystems Technology Laboratories (MTL). The impactful research that results from these engagements is often shared with the rest of the AI community through new papers or commercialized through Qualcomm Technologies’ products to reach scale and enhance our lives.
We sat down with Song Han, an MIT assistant professor at MTL and prominent researcher in the machine learning community, to talk about his lab’s research and recent awards in machine learning competitions.
What got you interested in AI and what key problems are you trying to solve?
AI is bringing significant benefits to society with outstanding potential to address essential human concerns in the future. Deep neural networks have shown significant improvements in many AI applications. However, the benefit comes at a high cost in terms of computational resources, especially the power budget and engineering resources. Our goal is to work on improving the efficiency of AI through algorithm and hardware co-design, enabling AI under constrained resources. In particular, we target efficient AI on edge devices, such as mobile phones and even IOT devices powered by micro-controllers. We investigated model compression techniques to reduce the computational resources, as well as AutoML techniques to reduce the engineering resources. Our recent technique, the Once-for-All Network (OFA), which appeared at ICLR ’20, consistently outperforms state-of-the-art (SOTA) neural architecture search (NAS) methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNet-V3) while reducing GPU hours and CO2 emission by orders of magnitude. In particular, OFA achieves a new SOTA 80.0% ImageNet top1 accuracy under the mobile setting (<600M FLOPs).
What attracted you to Qualcomm AI Research?
A few reasons. The leading AI research at Qualcomm Technologies provides us with real-world problems to solve. Given that Qualcomm Technologies powers over a billion AI-capable edge devices, it is a great channel to expand the impact of our research to benefit human society. Additionally, Qualcomm Technologies has leading AI products in terms of power efficiency and performance on edge devices, and their user-friendly tool chain allows students to apply their research on real-world products.
What challenge were you trying to solve with this research?
Qualcomm Technologies has a broad selection of platform solutions across product tiers with custom hardware that targets a variety of end products, from smartphones to smart speakers and cameras. Given such a diverse range of hardware devices, we wanted to tackle the challenge of how to efficiently deploy deep learning models on diverse hardware platforms. The difficulty is that different hardware platforms have different properties, such as parallelism, cache size, and memory bandwidth, thus requiring specialized neural networks to best fit the hardware. It is both engineering-expensive and computation-expensive to specialize models for each hardware.
What were the results of this research?
We introduced a new AutoML framework for efficient deep learning deployment under a large number of deployment scenarios. Our key idea is to decouple model training from architecture search to minimize the cost. To this end, we propose to train a once-for-all network that supports diverse architectural settings, such as depth, width, kernel size, and resolution. Inference is performed by selecting different parts of the OFA network without retraining. You can learn more about it in our OFA paper.
How does OFA compare with previous AutoML approaches?
Previous AutoML approaches focused on searching for a neural network architecture given a specific deployment scenario. For each deployment scenario, previous AutoML approaches required the architecture search process to be repeated and the resulting architecture to be retrained from scratch. As a result, the previous AutoML design cost grew linearly as the number of deployment scenarios grew. In contrast, OFA focuses on training a single network that supports many deployment scenarios without additional cost. The design cost of OFA stays constant with respect to the number of deployment scenarios. Moreover, the sub-networks selected from the OFA network can maintain the same level of accuracy performances compared with training the sub-networks from scratch. With a few epochs of finetuning, they can even achieve better accuracy than training from scratch.
What’s the difficulty of training such an OFA network? And how do you handle the challenge?
The difficulty is that we need to support a large number (more than 10^9) of sub-networks that interfere with each other when training the OFA network. To address this challenge, we introduce a progressive shrinking algorithm for training OFA networks. It starts with training the full network with maximum depth, width, and kernel size. Then it is finetuned to support smaller sub-networks in a progressive manner.
Do you have any plans to open source?
We have released our training code and pre-trained models on GitHub. We provide 50 specialized sub-networks with pre-trained weights over different hardware platforms, including CPU, GPU, mobile CPU, mobile GPU, DSP, and FPGA. Our model is up to 1.4x faster than MobileNet-v3 and 2.6x faster than EfficientNet with higher accuracy. We also provide pre-trained OFA networks and the full training code. Users can easily get a new specialized sub-network to fit their hardware without any training cost.
Your team won several model efficiency-related challenges in CVPR and ICCV. Can you tell us about that?
Using Once-for-All Network, we received 1st place in two rounds of Low-Power Computer Vision Challenge at ICCV’19 and NeurIPS’19 (both classification track and detection track). The challenge is to compete for the best ImageNet accuracy given a latency constraint on mobile devices. In addition, our team and Qualcomm Technologies both received 1st place in two tracks of the Visual Wakeup Words Challenge for microcontroller at CVPR’19, which was highlighted by TensorFlow in a recent blog. Our demo runs real time on a Google Pixel 3, which is powered by the Qualcomm Snapdragon 845 mobile platform. Our updated model now runs on a microcontroller.
Which Qualcomm Technologies products and tools did you use? And did they meet your expectations?
Our team used both AI hardware and software from Qualcomm Technologies. For hardware, we used the LG G8 phone, which is powered by Snapdragon 855 and includes the Qualcomm Hexagon 690 DSP. We also used Google Pixel-3, which has a Snapdragon 845, and the Samsung Galaxy Note 8, which has a Snapdragon 835. For the software tools, we used the Qualcomm Neural Processing SDK, which is easy to use and developer friendly. Our AutoML techniques and OFA network work perfectly with Qualcomm Technologies hardware. We are happy to land our research in new chips and products to achieve scale.
We believe OFA is a new paradigm that decouples model training from neural architecture search. It paves the way to accelerate many on-device AI applications and specialize them across many devices. We will investigate how to further improve performance on other tasks and specialized hardware platforms using OFA networks. There is also an opportunity for algorithm and hardware co-design, which opens up a much larger design space. For example, one specific area of research could be how OFA can work better on future Qualcomm Technologies platforms.
How will AI make our daily lives better in the near term? How about in the long-term, in 10 years?
The future will be populated with many IoT devices that are AI-capable. AI will surround our lives at much lower cost, lower latency, and higher accuracy. There will be more powerful AI applications running on tiny edge devices, which requires extremely compact models and efficient chips. At the same time, privacy will become increasingly important. On-device AI will be popular thanks to the privacy and latency advantages. Model compression and efficient architecture design techniques will enable on-device AI, making it more capable.
At Qualcomm Technologies, we make breakthroughs in fundamental research and scale them across devices and industries. Qualcomm AI Research works hand-in-hand with the rest of the company to integrate the latest AI developments and technology into our products — shortening the time between research in the lab and delivering AI advances that enrich lives.
If you’re excited about solving big problems with cutting-edge AI research—and improving the lives of billions of people—we’d like to collaborate and hear from you.