Back to All
Developer Blog

Qualcomm teams up with Nexa AI and Docker to bring AI to IoT and Robotics with NexaSDK for Linux

Sign up for Developer monthly newsletter-image

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Sign up
Come for support, stay for the community-image

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

Running multimodal AI models directly on edge and IoT devices is quickly becoming the default - because it delivers low-latency responses, keeps sensitive data local, and stays reliable even when connectivity is limited or offline.

But the real enabler is the NPU: purpose-built for AI inference, it delivers dramatically better performance-per-watt than CPU/GPU only setups, meaning faster inference, lower heat, longer battery life, and truly “always-on” AI experiences that fit within edge power and thermal constraints.

By teaming up with Nexa AI and Docker, Qualcomm Technologies has made NPU-first deployment practical with NexaSDK for Linux.

It’s a single, unified inference engine that runs the latest models developers need - LLM, VLM, speech, embeddings/rerank, and vision across NPU, GPU and CPU.

A straightforward Docker workflow enables clean setup and reproducible performance. Instead of stitching together drivers, runtimes, and per-model glue, developers can pull one container and start running modern multimodal models locally optimized for Qualcomm Hexagon NPU on the latest IoT devices with Qualcomm Dragonwing platforms.

For Linux, NexaSDK focuses on two flagship Qualcomm IoT platforms: Qualcomm Dragonwing IQ9 and Qualcomm Dragonwing RB3 Gen 2.

The Dragonwing IQ9 Series is built for high-performance industrial and edge AI workloads. This platform features an octa-core Qualcomm Kryo Gen 6 CPU running up to 2.36 GHz, a Qualcomm Adreno 663 GPU operating at up to 800 MHz, and a Qualcomm Hexagon NPU delivering between 50 and 100 dense TOPS

IQ 9 close up
IQ9 close up

The Dragonwing RB3 Gen 2 development kits target robotics, vision AI, and smart security use cases, offering an accessible and flexible platform for edge developers. Dragonwing RB3 Gen 2 integrates a multi-core CPU, an Adreno GPU for graphics and auxiliary compute, and a Hexagon NPU capable of up to 12 dense TOPS.

Dragonwing RB3 Gen 2 kit
Dragonwing RB3 Gen 2 kit

Docker-based NexaSDK for IoT devices

Linux IoT environments often suffer from OS and driver fragmentation, particularly when deploying NPU-accelerated AI inference workloads. Variations across Linux distributions, kernel versions, and vendor-specific drivers, combined with complex AI runtime dependencies, make deployment, optimization, and reproducibility difficult at scale.

The NexaSDK Docker image delivers a containerized AI runtime optimized for Linux ARM64-compatible systems with Hexagon NPUs, providing direct access to the Hexagon NPU, CPU, and GPU through a unified inference interface.

NexaSDK for Linux (Docker) architecture diagram
NexaSDK for Linux (Docker) architecture diagram

NexaSDK provides a consistent runtime across Linux distributions, isolates applications from host OS dependencies, and removes the need for manual NPU stack setup.

Specifically, NexaSDK for Linux offers the following advantages:

  • Consistent runtime across devices and Linux distributions.

  • Isolation from host OS dependencies, zero manual NPU stack setup, fast onboarding with a single docker run command.

  • Multiple model types support: LLM, VLM, Embeddings, Reranking, Computer Vision, and ASR models.

  • Easy SDK updates via pulling a new Docker image from Docker Hub.

Run models with NexaSDK for Linux

Through Qualcomm Technologies’ collaboration with Docker Inc. and Nexa AI, NexaSDK uses Docker-based virtualization to avoid the complexity of Linux OS setup while enabling consistent performance on Qualcomm Technologies’ platforms.

NexaSDK supports both interactive and server modes. Using IBM Granite-4-350M as an example, developers can run models directly in an interactive CLI or deploy them as a persistent REST service. You can follow the SDK docs for details.

 

Getting started

1.                  Interactive CLI mode

Bash
export NEXA_TOKEN="YOUR_LONG_TOKEN_HERE"

docker run --rm -it --privileged \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest infer NexaAI/Granite-4.0-h-350M-NPU

2. Server (REST API) mode:

export NEXA_TOKEN="YOUR_LONG_TOKEN_HERE"

docker run --rm -it --privileged \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest pull NexaAI/Granite-4.0-h-350M-NPU

docker run --rm -d -p 18181:18181 --privileged \
  -v /etc/machine-id:/etc/machine-id:ro \
  -e NEXA_TOKEN \
  nexa4ai/nexasdk:latest serve

This video demo demonstrates both CLI and server modes across LLM, VLM, ASR, and embedding models.

 

Video understanding on Qualcomm Dragonwing IQ9

Video represents one of the richest yet most underutilized sources of real-time intelligence in enterprise applications, capturing valuable visual context and temporal patterns across domains such as security operations, industrial monitoring, retail analytics, and smart workplaces.

To demonstrate practical video understanding capabilities, NexaSDK provides a complete end-to-end demo running on Dragonwing IQ9, powered by AutoNeural, Nexa AI’s NPU-native 1.5B-parameter vision–language model.

The demo ingests uploaded videos, automatically extracts key frames at fixed intervals, and performs sequential vision-language inference to generate meaningful, human-readable insights from each scene.

Results are streamed in real time through an interactive Gradio-based UI.

By combining efficient on-device inference, low-latency responsiveness, and multimodal understanding, NexaSDK on Dragonwing IQ9 demonstrates how video can be transformed into a first-class intelligent data source rather than a static recording.

 

A quick start for Linux developers

NexaSDK includes a starter guide that allows most developers to get a test build running quickly.

All NexaSDK Docker image versions are published on Docker Hub.

As of December 2025, NexaSDK has supported the following models on IoT devices. The models are hosted on Huggingface.

 

Supported models on Dragonwing IQ9

Vision Language Models (VLM)

  • AutoNeural: NexaAI/AutoNeural
     

Large Language Models (LLM)

  • FM2.5-1.2B: NexaAI/LFM2.5-1.2B-npu

  • FM2-1.2B: NexaAI/LFM2-1.2B-npu

  • Granite-4.0-h-350M: NexaAI/Granite-4.0-h-350M-NPU 
     

Embedding Models (Embedding)

  • EmbeddingGemma-300M: NexaAI/embeddinggemma-300m-npu

  • EmbedNeural: NexaAI/EmbedNeural

 

Reranking Models (rerank)

·      Jina-v2 Reranker: NexaAI/jina-v2-rerank-npu

ASR

·      Parakeet-TDT-0.6B-v3: NexaAI/parakeet-tdt-0.6b-v3-npu

 

Computer Vision Models (CV)

·      YOLOv12: NexaAI/yolov12-npu

·      RF-DETR Segmentation: NexaAI/rf-detr-seg-preview-npu

·      ConvNeXt-Tiny: NexaAI/convnext-tiny-npu-IoT

Supported models on RB3 Gen 2

Computer Vision Models (CV)

·      ConvNeXt-Tiny NexaAI/convnext-tiny-npu-IoT-rb3

 

Get started today

Ready to bring state-of-the-art AI to your IoT and robotics applications?

Explore the NexaSDK documentationjoin the developer community, and see what’s possible when you combine Qualcomm Technologies’ industry-leading NPUs with the flexibility of containerized AI deployment. 

 

 

 

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Authors
Ruby Hagin
Ruby HaginSenior Marketing Communications Specialist
Alex Chen
Alex ChenCEO & Founder of Nexa AI
Zack Li
Zack LiCTO & Co-founder of Nexa AI
Alan Zhu
Alan ZhuProduct Lead at Nexa AI
Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.