Developer Blog

Run Nexa AI agents locally on Snapdragon X PCs with Hexagon NPU

Written by

Madhura Chatterjee

Written by

Alan Zhu

Written by

Zack Li

Mar 19, 2026

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Sign up

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

As generative AI adoption accelerates across industries, the next frontier is clear: running meaningful AI agentic workflows fully on-device harnessing the power of on-device SLM & LLM inference.

Cloud-only AI no longer satisfies the requirements of latency-critical, private, and personalized computing. This is where Nexa AI - a next-generation, multimodal AI framework - and Snapdragon X Series processors come together to define new multimodal AI agents running on-device with zero-cloud.

Nexa AI developers and end-users, the pairing with PCs powered by Snapdragon X Series with the Qualcomm Hexagon NPU unlocks a new level of capability: local LLMs, multimodal reasoning, and agentic workflows running at the edge for groundbreaking on-device AI performance while maintaining power efficiency, long battery life, and real-time responsiveness.

A leap forward in on-device AI: why NPU acceleration matters

Traditional AI inference pipelines rely heavily on CPU or GPU execution. But while powerful, these units are not optimized for sustained inference at lower power. The Hexagon NPU is explicitly designed for this purpose:

The Hexagon NPU in the first generation Snapdragon X Series processors is capable of up to 45 TOPS, exceeding Microsoft’s baseline Copilot+ PC requirements for AI-centric tasks.
The next-gen Hexagon NPU in the Snapdragon X2 Series processors reaches up to 80 TOPS, a 78% peak performance uplift over the previous generation.

Nexa AI meets Snapdragon: real performance gains

In late 2025, Nexa AI released an SDK optimized specifically for Hexagon NPUs. The update enabled multiple Nexa and third-party state-of-the-art LLMs and multimodal models to execute directly on the NPU, rather than consuming CPU or GPU cycles.

Nexa’s own OmniNeural-4B multimodal model runs fully on the NPU with strong efficiency and responsiveness.
Dedicated NPU-compiled versions of Ministral-3-3B, Granite-4, Microsoft Phi-4 mini, and Qwen3-4B are supported.
These early NPU-tuned models operate within a ~4B parameter budget—ideal for edge inference on PCs.

Developer reality: what running Nexa AI on PCs with Snapdragon looks like

With NPU-accelerated Nexa AI, developers can now ship apps that run offline.

Local LLM Inference

Models like Granite-4 execute at the edge, allowing:

Offline chat & summarization
Private document processing
Local semantic search through user files

Multimodal Experiences

Nexa AI supports latest text, audio, and image reasoning models using the NPU—ideal for apps like:

Real-time transcription
Smart meeting assistants
On-device vision classification
Local media understanding

Agentic Workflows

With a Hexagon NPU, Nexa agents can perform real-time, multi-step actions without cloud round-trips, improving:

Latency
Reliability
Privacy

Battery-Friendly AI

PCs powered by Snapdragon X Series offer optimized power consumption by spreading workloads across the CPU, GPU, and NPU – ideal for power efficient use cases.

What this means for the future of on-device AI

The combination of Nexa AI + PCs with Snapdragon is a preview of the next decade of computing:

Edge-Native AI Experiences

Models run where users work—not in distant data centers.

A Growing Ecosystem of NPU-Accelerated Apps

Productivity, creative, and agentic apps can increasingly target the Hexagon NPU as Windows AI APIs mature.

Responsive, Private, Local Intelligence

User data never leaves the device—critical for regulated industries and privacy-sensitive workflows.

Conclusion: Nexa AI + PCs with Snapdragon X Series = the Future of local, on-device AI agents

With the rise of Copilot+ PCs with Snapdragon X Series processors and Nexa AI’s NPU-optimized SDK, developers and users can finally experience powerful AI—without dependence on the cloud.

Real-time multimodal reasoning, background AI agents, advanced document understanding, and local LLMs are no longer theoretical—they’re running today on Windows PCS with Snapdragon which has the Hexagon NPU incorporated.

This is the beginning of a new era: on-device AI that’s fast, private, efficient, and always available. And Nexa AI is helping lead the way with Windows PCs with Snapdragon X Series.

Partner Windows on Snapdragon Hexagon NPU

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

About the Authors

Madhura ChatterjeeSenior Director, Product Management (Compute Software), Qualcomm Technologies, Inc.