Qualcomm teams up with Nexa AI and Docker to bring AI to IoT and Robotics with NexaSDK for Linux
Sign up for Developer monthly newsletter
Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.
Sign upCome for support, stay for the community
Get support from experts, connect with like-minded developers, and access exclusive virtual events.
Join Developer DiscordRunning multimodal AI models directly on edge and IoT devices is quickly becoming the default - because it delivers low-latency responses, keeps sensitive data local, and stays reliable even when connectivity is limited or offline.
But the real enabler is the NPU: purpose-built for AI inference, it delivers dramatically better performance-per-watt than CPU/GPU only setups, meaning faster inference, lower heat, longer battery life, and truly “always-on” AI experiences that fit within edge power and thermal constraints.
By teaming up with Nexa AI and Docker, Qualcomm Technologies has made NPU-first deployment practical with NexaSDK for Linux.
It’s a single, unified inference engine that runs the latest models developers need - LLM, VLM, speech, embeddings/rerank, and vision across NPU, GPU and CPU.
A straightforward Docker workflow enables clean setup and reproducible performance. Instead of stitching together drivers, runtimes, and per-model glue, developers can pull one container and start running modern multimodal models locally optimized for Qualcomm Hexagon NPU on the latest IoT devices with Qualcomm Dragonwing platforms.
For Linux, NexaSDK focuses on two flagship Qualcomm IoT platforms: Qualcomm Dragonwing IQ9 and Qualcomm Dragonwing RB3 Gen 2.
The Dragonwing IQ9 Series is built for high-performance industrial and edge AI workloads. This platform features an octa-core Qualcomm Kryo Gen 6 CPU running up to 2.36 GHz, a Qualcomm Adreno 663 GPU operating at up to 800 MHz, and a Qualcomm Hexagon NPU delivering between 50 and 100 dense TOPS.
The Dragonwing RB3 Gen 2 development kits target robotics, vision AI, and smart security use cases, offering an accessible and flexible platform for edge developers. Dragonwing RB3 Gen 2 integrates a multi-core CPU, an Adreno GPU for graphics and auxiliary compute, and a Hexagon NPU capable of up to 12 dense TOPS.
Docker-based NexaSDK for IoT devices
Linux IoT environments often suffer from OS and driver fragmentation, particularly when deploying NPU-accelerated AI inference workloads. Variations across Linux distributions, kernel versions, and vendor-specific drivers, combined with complex AI runtime dependencies, make deployment, optimization, and reproducibility difficult at scale.
The NexaSDK Docker image delivers a containerized AI runtime optimized for Linux ARM64-compatible systems with Hexagon NPUs, providing direct access to the Hexagon NPU, CPU, and GPU through a unified inference interface.
NexaSDK provides a consistent runtime across Linux distributions, isolates applications from host OS dependencies, and removes the need for manual NPU stack setup.
Specifically, NexaSDK for Linux offers the following advantages:
- Consistent runtime across devices and Linux distributions.
- Isolation from host OS dependencies, zero manual NPU stack setup, fast onboarding with a single docker run command.
- Multiple model types support: LLM, VLM, Embeddings, Reranking, Computer Vision, and ASR models.
- Easy SDK updates via pulling a new Docker image from Docker Hub.
Run models with NexaSDK for Linux
Through Qualcomm Technologies’ collaboration with Docker Inc. and Nexa AI, NexaSDK uses Docker-based virtualization to avoid the complexity of Linux OS setup while enabling consistent performance on Qualcomm Technologies’ platforms.
NexaSDK supports both interactive and server modes. Using IBM Granite-4-350M as an example, developers can run models directly in an interactive CLI or deploy them as a persistent REST service. You can follow the SDK docs for details.
Getting started
1. Interactive CLI mode
Bash
export NEXA_TOKEN="YOUR_LONG_TOKEN_HERE"
docker run --rm -it --privileged \
-v /etc/machine-id:/etc/machine-id:ro \
-e NEXA_TOKEN \
nexa4ai/nexasdk:latest infer NexaAI/Granite-4.0-h-350M-NPU
2. Server (REST API) mode:
export NEXA_TOKEN="YOUR_LONG_TOKEN_HERE"
docker run --rm -it --privileged \
-v /etc/machine-id:/etc/machine-id:ro \
-e NEXA_TOKEN \
nexa4ai/nexasdk:latest pull NexaAI/Granite-4.0-h-350M-NPU
docker run --rm -d -p 18181:18181 --privileged \
-v /etc/machine-id:/etc/machine-id:ro \
-e NEXA_TOKEN \
nexa4ai/nexasdk:latest serve
This video demo demonstrates both CLI and server modes across LLM, VLM, ASR, and embedding models.
Video understanding on Qualcomm Dragonwing IQ9
Video represents one of the richest yet most underutilized sources of real-time intelligence in enterprise applications, capturing valuable visual context and temporal patterns across domains such as security operations, industrial monitoring, retail analytics, and smart workplaces.
To demonstrate practical video understanding capabilities, NexaSDK provides a complete end-to-end demo running on Dragonwing IQ9, powered by AutoNeural, Nexa AI’s NPU-native 1.5B-parameter vision–language model.
The demo ingests uploaded videos, automatically extracts key frames at fixed intervals, and performs sequential vision-language inference to generate meaningful, human-readable insights from each scene.
Results are streamed in real time through an interactive Gradio-based UI.
By combining efficient on-device inference, low-latency responsiveness, and multimodal understanding, NexaSDK on Dragonwing IQ9 demonstrates how video can be transformed into a first-class intelligent data source rather than a static recording.
A quick start for Linux developers
NexaSDK includes a starter guide that allows most developers to get a test build running quickly.
All NexaSDK Docker image versions are published on Docker Hub.
As of December 2025, NexaSDK has supported the following models on IoT devices. The models are hosted on Huggingface.
Supported models on Dragonwing IQ9
Vision Language Models (VLM)
- AutoNeural:
NexaAI/AutoNeural
Large Language Models (LLM)
- FM2.5-1.2B:
NexaAI/LFM2.5-1.2B-npu - FM2-1.2B:
NexaAI/LFM2-1.2B-npu - Granite-4.0-h-350M: NexaAI/Granite-4.0-h-350M-NPU
Embedding Models (Embedding)
- EmbeddingGemma-300M: NexaAI/embeddinggemma-300m-npu
- EmbedNeural: NexaAI/EmbedNeural
Reranking Models (rerank)
· Jina-v2 Reranker: NexaAI/jina-v2-rerank-npu
ASR
· Parakeet-TDT-0.6B-v3: NexaAI/parakeet-tdt-0.6b-v3-npu
Computer Vision Models (CV)
· YOLOv12: NexaAI/yolov12-npu
· RF-DETR Segmentation: NexaAI/rf-detr-seg-preview-npu
· ConvNeXt-Tiny: NexaAI/convnext-tiny-npu-IoT
Supported models on RB3 Gen 2
Computer Vision Models (CV)
· ConvNeXt-Tiny NexaAI/convnext-tiny-npu-IoT-rb3
Get started today
Ready to bring state-of-the-art AI to your IoT and robotics applications?
Explore the NexaSDK documentation, join the developer community, and see what’s possible when you combine Qualcomm Technologies’ industry-leading NPUs with the flexibility of containerized AI deployment.




