Developer Blog

Qualcomm teams up with Nexa AI and Docker to bring AI to IoT and Robotics with NexaSDK for Linux

Written by

Ruby Hagin

Written by

Alex Chen

Written by

Zack Li

Written by

Alan Zhu

Mar 4, 2026

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

Running multimodal AI models directly on edge and IoT devices is quickly becoming the default - because it delivers low-latency responses, keeps sensitive data local, and stays reliable even when connectivity is limited or offline.

But the real enabler is the NPU: purpose-built for AI inference, it delivers dramatically better performance-per-watt than CPU/GPU only setups, meaning faster inference, lower heat, longer battery life, and truly “always-on” AI experiences that fit within edge power and thermal constraints.

By teaming up with Nexa AI and Docker, Qualcomm Technologies has made NPU-first deployment practical with NexaSDK for Linux.

It’s a single, unified inference engine that runs the latest models developers need - LLM, VLM, speech, embeddings/rerank, and vision across NPU, GPU and CPU.

A straightforward Docker workflow enables clean setup and reproducible performance. Instead of stitching together drivers, runtimes, and per-model glue, developers can pull one container and start running modern multimodal models locally optimized for Qualcomm Hexagon NPU on the latest IoT devices with Qualcomm Dragonwing platforms.

For Linux, NexaSDK focuses on two flagship Qualcomm IoT platforms: Qualcomm Dragonwing IQ9 and Qualcomm Dragonwing RB3 Gen 2.

The Dragonwing IQ9 Series is built for high-performance industrial and edge AI workloads. This platform features an octa-core Qualcomm Kryo Gen 6 CPU running up to 2.36 GHz, a Qualcomm Adreno 663 GPU operating at up to 800 MHz, and a Qualcomm Hexagon NPU delivering between 50 and 100 dense TOPS.

IQ9 close up

The Dragonwing RB3 Gen 2 development kits target robotics, vision AI, and smart security use cases, offering an accessible and flexible platform for edge developers. Dragonwing RB3 Gen 2 integrates a multi-core CPU, an Adreno GPU for graphics and auxiliary compute, and a Hexagon NPU capable of up to 12 dense TOPS.

Dragonwing RB3 Gen 2 kit

Docker-based NexaSDK for IoT devices

Linux IoT environments often suffer from OS and driver fragmentation, particularly when deploying NPU-accelerated AI inference workloads. Variations across Linux distributions, kernel versions, and vendor-specific drivers, combined with complex AI runtime dependencies, make deployment, optimization, and reproducibility difficult at scale.

The NexaSDK Docker image delivers a containerized AI runtime optimized for Linux ARM64-compatible systems with Hexagon NPUs, providing direct access to the Hexagon NPU, CPU, and GPU through a unified inference interface.

NexaSDK for Linux (Docker) architecture diagram

NexaSDK provides a consistent runtime across Linux distributions, isolates applications from host OS dependencies, and removes the need for manual NPU stack setup.

Specifically, NexaSDK for Linux offers the following advantages:

Consistent runtime across devices and Linux distributions.
Isolation from host OS dependencies, zero manual NPU stack setup, fast onboarding with a single docker run command.
Multiple model types support: LLM, VLM, Embeddings, Reranking, Computer Vision, and ASR models.
Easy SDK updates via pulling a new Docker image from Docker Hub.

Run models with NexaSDK for Linux

Through Qualcomm Technologies’ collaboration with Docker Inc. and Nexa AI, NexaSDK uses Docker-based virtualization to avoid the complexity of Linux OS setup while enabling consistent performance on Qualcomm Technologies’ platforms.

NexaSDK supports both interactive and server modes. Using IBM Granite-4-350M as an example, developers can run models directly in an interactive CLI or deploy them as a persistent REST service. You can follow the SDK docs for details.

Getting started

1. Interactive CLI mode

Bash
export NEXA_TOKEN="YOUR_LONG_TOKEN_HERE"

docker run --rm -it --privileged \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest infer NexaAI/Granite-4.0-h-350M-NPU

2. Server (REST API) mode:

export NEXA_TOKEN="YOUR_LONG_TOKEN_HERE"

docker run --rm -it --privileged \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest pull NexaAI/Granite-4.0-h-350M-NPU

docker run --rm -d -p 18181:18181 --privileged \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest serve

This video demo demonstrates both CLI and server modes across LLM, VLM, ASR, and embedding models.

Video understanding on Qualcomm Dragonwing IQ9

Video represents one of the richest yet most underutilized sources of real-time intelligence in enterprise applications, capturing valuable visual context and temporal patterns across domains such as security operations, industrial monitoring, retail analytics, and smart workplaces.

To demonstrate practical video understanding capabilities, NexaSDK provides a complete end-to-end demo running on Dragonwing IQ9, powered by AutoNeural, Nexa AI’s NPU-native 1.5B-parameter vision–language model.

The demo ingests uploaded videos, automatically extracts key frames at fixed intervals, and performs sequential vision-language inference to generate meaningful, human-readable insights from each scene.

Results are streamed in real time through an interactive Gradio-based UI.

By combining efficient on-device inference, low-latency responsiveness, and multimodal understanding, NexaSDK on Dragonwing IQ9 demonstrates how video can be transformed into a first-class intelligent data source rather than a static recording.

A quick start for Linux developers

NexaSDK includes a starter guide that allows most developers to get a test build running quickly.

All NexaSDK Docker image versions are published on Docker Hub.

As of December 2025, NexaSDK has supported the following models on IoT devices. The models are hosted on Huggingface.

Supported models on Dragonwing IQ9

Vision Language Models (VLM)

AutoNeural: NexaAI/AutoNeural

Large Language Models (LLM)

FM2.5-1.2B: NexaAI/LFM2.5-1.2B-npu
FM2-1.2B: NexaAI/LFM2-1.2B-npu
Granite-4.0-h-350M: NexaAI/Granite-4.0-h-350M-NPU

Embedding Models (Embedding)

EmbeddingGemma-300M: NexaAI/embeddinggemma-300m-npu
EmbedNeural: NexaAI/EmbedNeural

Reranking Models (rerank)

· Jina-v2 Reranker: NexaAI/jina-v2-rerank-npu

ASR

· Parakeet-TDT-0.6B-v3: NexaAI/parakeet-tdt-0.6b-v3-npu

Computer Vision Models (CV)

· YOLOv12: NexaAI/yolov12-npu

· RF-DETR Segmentation: NexaAI/rf-detr-seg-preview-npu

· ConvNeXt-Tiny: NexaAI/convnext-tiny-npu-IoT

Supported models on RB3 Gen 2

Computer Vision Models (CV)

· ConvNeXt-Tiny NexaAI/convnext-tiny-npu-IoT-rb3

Get started today

Ready to bring state-of-the-art AI to your IoT and robotics applications?

Explore the NexaSDK documentation, join the developer community, and see what’s possible when you combine Qualcomm Technologies’ industry-leading NPUs with the flexibility of containerized AI deployment.

Linux Qualcomm Linux IoT Edge AI Partner

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Authors

Ruby HaginSenior Marketing Communications Specialist

Alex ChenPrincipal Engineer/Manager, Qualcomm

Zack LiSr. Staff Engineer, Qualcomm

Alan ZhuSenior Product Manager, Qualcomm