OnQ Blog

How developers can apply AI for IoT audio and visual use cases

Jan 29, 2018

Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.

Have you started working with the Qualcomm Snapdragon Neural Processing Engine (NPE) SDK yet? I posted last time about deep learning at the edge on IoT devices and described how you can use the SDK to execute artificial intelligence (AI) workloads on GPU, DSP and CPU.

Since then, we’ve added to the number and strength of deep neural networks (DNN) we support in the SNPE SDK. To help get the wheels of innovation turning for you, in this post I’ll describe ways you can apply AI using those DNNs to various IoT devices and use cases, including context-aware AI.

How you can use AI in the IoT

First, we’ve found that although many developers want to use AI in IoT, there is no comprehensive resource telling them how they can use it. They can find one-off case studies and application stories, but not a consolidated view, so here’s a table with a full overview from our perspective:


The column headings across the top are the types of edge devices that are ripe for AI. The rows along the side show the diversity of use cases, sorted by visual and audio. Where they intersect, you’ll find applications you can build with the neural networks supported in the NPE SDK. In short, AI has the potential to accelerate IoT adoption across companion robots, home hubs, smart assistants or wearable trackers for patient care, introducing an emotional dimension through visual and audio analytics.

Running inference workloads quickly and power-efficiently on devices at the network edge will be an important differentiator among IoT devices. Both training and inference are usually run in the cloud, but when you build on Qualcomm Technologies’ IoT platforms, you no longer have to rely on the cloud for inference. You can take advantage of energy-efficient heterogeneous computing on the device by running the DNNs for inference on the optimal Snapdragon compute core: CPU, GPU and Qualcomm Hexagon Vector eXtensions (HVX) on DSP.

Running DNNs on HVX not only gives you audio and visual analytics at the edge but also frees up the GPU for 4K and 1080p camera features like staggered high-definition resolution (sHDR), temporal noise reduction (TNR) and multiple video streams.

As inference and AI move to devices like smartphones, privacy becomes a priority for consumers and businesses. Running AI at the edge addresses that priority by keeping the data off the network and out of the cloud. Your users will thank you for respecting their privacy, providing reliability and reducing latency.

Deploying visual analytics at the edge

First, let’s explore visual analytics at the edge. Household monitoring systems use cameras to provide family members with a view of what’s going on at home. Robots and drones use visual analytics to map their environment and navigate around obstacles.

To run visual analytics on edge devices like those in the table above you need DNNs of the right size and with the right balance of inference and accuracy. Consider these use cases for visual analytics:

  • Object detection and classification - Besides the YOLOv2 model, more-recent models such as SSD and MobileNets are based on lighter neural networks. Depending on your application (consumer, home, enterprise, smart city), you can get between 5 and 20 inferences per second with them.
  • Face detection and recognition - Faster R-CNN for greatest accuracy with 50 or 100 regions of interest (ROI). Also VGG-16, Single Shot Detectors (SSD) and MobileNets.
  • Scene classification - MobileNets.
  • Video summarization - Long Short-Term Memory (LSTM) is useful in summarizing content like sports video and can also be used for image captioning.
  • Food detection and classification - ImageNet.

Deploying audio analytics at the edge

Devices like smart home assistants and voice-controlled speakers rely on audio analytics at the edge for voice user interface (VUI) and conversational interface.

Why run these analytics at the edge rather than in the cloud? Adoption of wearables, home hubs, companion robots and smart appliances continues to grow, so keeping VUI and target sound detection on the device helps reassure consumers who are concerned about privacy. Running them on the device also helps reduce latency by keeping the analytics workload out of the cloud.

Like visual analytics, audio analytics is well suited to heterogeneous computing and to running convolutional neural networks (CNN) and recurrent neural networks (RNN) on the Snapdragon mobile processor. The energy-efficient compute power of the DSP cores on Snapdragon also provides the ability to embed automatic speech recognition (ASR) and natural language processing (NLP) or audio classification at the edge.

Consider these DNN use cases for audio analytics:

  • Audio Classification - RNN
  • Automatic Speech Recognition - CNN/LSTM
  • Voice Activation - Gaussian Mixture Model (GMM)

New neural network support in the NPE SDK

Before, the NPE SDK offered support for AlexNet, Inception v1, Inception v3 and ResNet50. In the few months since my last post, we have expanded our support to several more networks you can use to build applications for visual and audio analytics:

  • Faster R-CNN - For greater accuracy in detection, based on a Region Proposal Network using py-faster-rcnn.
  • MobileNets- Smaller and optimized network for object detection, facial attributes and large-scale geo-localization, with the trade-off of lower accuracy.
  • SSD - Provides support for a traditional SSD network.
  • VGG-16 - For a highly accurate object classification benchmark model.
  • LSTM - Provides support for audio analytics including embedded ASR/NLP.

Deploying the Snapdragon Neural Processing Engine

The diagram below illustrates the role of NPE both in the cloud and on the device.

The NPE SDK includes conversion tools for networks trained in the cloud with TensorFlow and Caffe/Caffe2. You migrate the trained networks as .dlc files to the Qualcomm Technologies SoC runtime environment.

For on-device execution and inference, the SDK also provides debug and performance optimization tools for each of the compute cores-CPU, GPU and Hexagon DSP. With the SDK you can optimize your applications to run workloads on the core best suited to their use cases.

Make the next move

If you haven’t started exploring AI in IoT and creating innovative devices and applications, now is the time to download the NPE SDK, documentation and sample code.


Qualcomm Snapdragon and Qualcomm Hexagon are products of Qualcomm Technologies, Inc.