Developer Blog

Qualcomm Linux sample apps – AI object detection and parallel AI fusion (Part 2 of 2)

Jul 11, 2024

Last time we walked you through two building-block sample apps from the Qualcomm Intelligent Multimedia SDK. The SDK is based on Qualcomm® Linux® software our distribution that lets you write once and use for many of our IoT system-on-chips (SoCs). In this post we’ll explore two more of the 22 sample applications in the SDK to demonstrate how you can write apps for our IoT chipsets.

3. Seeing through AI: Live stream object detection

This command-line application, gst-ai-object-detection, takes a live video stream from a camera and hands it off to open-source YOLO (You Only Look Once) AI models for object detection. It performs object detection with pre-processing and AI inference on dedicated hardware blocks, executing YOLOV5, YOLOV8 or YOLO-Nas using the Snapdragon Neural Processing SDK. It then displays a preview with overlaid output, such as labels and bounding boxes, based on the model.

The application pipeline looks like this:

qtiqmmfsrc – Using this gstreamer plugin, the application captures the camera live stream, then uses tee to split the stream.
qtimlvconverter – This preprocessing plugin performs tasks like color conversion, down-/upscaling and normalization on the stream data. It converts the video stream to a tensor stream for inference later.
qtimlsnpe – This machine learning inferencing plugin applies YOLO-Nas (default), YOLOV8 or YOLOV5 to detect objects in the stream. It executes the Snapdragon Neural Processing runtime in hardware on a CPU-, GPU- and DSP-based neural processing unit (NPU).
The SNPE runtime performs inference on the tensor stream and produces a tensor stream with the inference results.
qtimlvdetection – For post-processing, this plugin applies the threshold to the chosen number of desired results. It loads the YOLO-Nas module for post-processing, produces video frames containing only bounding boxes (for overlay) and hands the frames off for video composition.
qtivcomposer – This plugin overlays frames, with the bounding boxes and labels, onto frames from the live camera stream, then hands off gst buffers with the combined layers.
Waylandsink submits the received video stream to Weston, which renders the video stream on a local display.

Here’s an example of using gst-ai-object-detection to detect a person in a camera stream:

When would you use this application?

gst-ai-object-detection has dozens of classes that you can build into your own applications for detecting objects (people, vehicles, animals, etc.) and locating them in a camera frame. Examples include detecting helmets, fire/smoke and intruders.

4. Parallel AI fusion: Four AI inferences on live camera

This command-line application, gst-ai-parallel-inference, extends the one-channel app above to four channels of parallel processing of AI models on dedicated hardware blocks. Besides object detection, it adds classification, pose detection and segmentation, then displays scaled-down, composed previews of the live camera stream with overlaid output from all four models.

The application pipeline is a variation of the one above, with separate flow for each AI inference as shown below:

qtiqmmfsrc – Using this gstreamer plugin, the application captures the camera live stream, then uses tee’s to generate four parallel streams.
qtimlvconverter – This preprocessing plugin performs tasks like color conversion, down-/upscaling and normalization on the stream data. It converts the video stream to a tensor stream for inference later.
qtimlsnpe – This machine learning inferencing plugin applies YOLO-Nas for object detection and DeepLabv3 for image segmentation. The plugin executes the Snapdragon Neural Processing runtime in hardware on a CPU-, GPU- and DSP-based neural processing unit (NPU).
qtimltflite – This plugin applies PoseNet for pose detection and Inception V3 for object classification. The plugin executes the TFLite runtime in hardware on a CPU-, GPU- and DSP-based neural processing unit (NPU).
Post-processing uses a different plugin for each model.
1. qtimlvdetection – For object detection, this plugin applies the threshold to the chosen number of desired results. It loads the YOLO-Nas post-processing module, produces video frames containing only bounding boxes (for overlay) and hands the frames off for video composition.
2. Qtimlvclassification – For classification, this plugin applies the threshold to the chosen number of desired results. It loads the Inception V3 post-processing module, produces video frames with classification labels (for overlay) and hands the frames off for video composition.
3. qtimlvpose – For pose estimation, this plugin applies the threshold to the chosen number of desired results. It is capable of loading modules for different pose estimation models. In this use case, it loads the PoseNet module, produces video frames with poses drawn (for overlay) and hands the frames off for video composition.
4. qtimlvsegmentation – For segmentation, this plugin converts the inference tensors it receives into video formats that our multimedia plugins can understand later.
qtivcomposer <code>qtivcomposer</code> – This plugin overlays frames from the AI models onto frames from the live camera stream, then hands off gst buffers with the combined layers.
Waylandsink submits the received video stream to Weston, which renders the video stream on a local display.

Here’s an example of the rendered video stream from gst-ai-parallel-inference:

Top left: object detection. Top right: classification model output. Lower left: Pose estimation green lines. Lower right: Segmentation model output as overlayed white pixels on person.

When would you use this application?

As a superset of gst-ai-object-detection, gst-ai-parallel-inference allows you to detect people, vehicles, animals and other objects – even smoke and fire – in a camera frame.

With pose detection, you can determine, for example, whether a person is lying, sitting or standing, with the potential to determine whether the person has fallen. A gym trainer or yoga instructor can use pose detection to understand whether a student is assuming a pose correctly or not. An ergonomics application can watch for and remind about correct posture in a chair or at a desk.

Scenarios for classification apps include product categorization, and for segmentation they include manufacturing, healthcare and logistics.

Next steps

Those are two more of the compelling applications we’ve built to showcase Qualcomm Linux. You can get them and the entire Qualcomm Intelligent Multimedia SDK from open source, with 20 more applications for AI and multimedia. Then you can start incorporating them to your own applications.

We’ve designed Qualcomm Linux so you can write once and run on multiple IoT chipsets with the same source code. The Qualcomm Intelligent Multimedia SDK is the first time we’ve opened all of our multimedia subsystems – including camera, artificial intelligence and audio – to developers via APIs.

As we give more to open source, you can customize, try out and contribute to this work. It’s a big step in our developer-first mindset, in which we make it easier for you to develop the kinds of customizations you want in your IoT applications running on Linux.

Missed part 1 of this blog? Find it here.

What else is trending for Internet of Things:

Read how CyberLink ports FaceMe to Qualcomm Hexagon NPU for facial recognition on edge devices

How Capgemini uses Qualcomm Dragonwing portoflio to enhance railway safety with Edge AI

Learn how to optimise your AI model for the Edge

Browse highlights from Embedded World 2025

Watch developer Build Along sessions:

MCP IoT Agent for Snapdragon X Elite and Rubik Pi

Using TensorFlow to accelerate models on Qualcomm IoT devices

Docker and Qualcomm Dragonwing RB3 Gen 2 x FoundriesFactory

AMA session with Qualcomm and Edge Impulse

AprilTag and Qualcomm RB3 Gen 2

What’s next?

Join our Discord community to keep up with our ongoing outreach to developers like you who work with Qualcomm Technologies’ products.

On Discord, you can rub elbows with our experts, connect with fellow developers working with our technology and keep your fingers on the pulse of developer-focused news and product updates.

AI Edge AIoT Qualcomm Linux Internet of Things Linux

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.