How Qualcomm and Nexa AI are driving NPU-First Innovation that unlocks natural, local, and scalable intelligence everywhere
Qualcomm Technologies, Inc. stands at the forefront of on-device AI, relentlessly pushing the boundaries of what’s possible. Every day, we witness groundbreaking advances in on-device intelligence from visionary AI companies across the globe.
Among these trailblazers is Nexa.AI, with OmniNeural-4B—a revolutionary multimodal AI model— that has been engineered from the ground up to run natively on the Hexagon NPU. This powerful synergy is ushering in a new era of on-device intelligence, empowering devices with unprecedented capabilities right at the edge.
OmniNeural-4B stands apart as the world’s first NPU-aware multi-modal model engineered to seamlessly understand and process text, voice, and vision together. Unlike conventional AI models that depend only on cloud connectivity, OmniNeural harnesses the on-device AI performance from Hexagon NPUs, ushering in a new reality where AI is both local and deeply integrated.
Whether you’re using a laptop, smartphone, automotive, XR or IoT device, this model delivers state-of-the-art AI performance right where you need it—with all the on-device AI benefits like immediacy, cost-efficiency, reliability and privacy.
Unlocking Multimodal AI Performance on Device
OmniNeural-4B is designed for high-quality natural language, vision, and audio understanding, with advanced text generation capability, while being optimized for performance on device. OEMs and Developers can deliver:
- Smarter mobile apps: Private, on-device agents that see, hear, and understand personal context—powering chat, translation, and creative tools that respond instantly without the cloud.
- Intelligent IoT devices: From smart homes to factory floors, devices reason across signals in real time to detect patterns, guide actions, and adapt naturally at the edge.
- Next-gen automotive experiences: Voice copilots that recognize drivers and surroundings, personalize cabin experiences, and assist with tasks—all directly on the vehicle’s compute platform.
- Productivity and creativity on PCs: Local AI that summarizes, creates, and automates workflows with richer context and stronger privacy—delivering results with lower latency, powered by Snapdragon X Series compute platforms
Introducing NexaML: Revolutionizing On-Device AI Inferencing
NexaML is a breakthrough solution designed for the new era of on-device AI, offering a seamless experience that combines the simplicity of cloud APIs with the power of local AI model inferencing. By running models directly on NPUs, NexaML delivers state-of-the-art performance and full compatibility with the latest advancements in AI.
Built to harness the full capabilities of the Qualcomm AI Stack, NexaML leverages Qualcomm AI Engine direct (QNN) for direct access to the Hexagon NPU, unlocking unprecedented performance from your device. With NexaML, developers can deploy sophisticated AI applications locally, benefiting from cutting-edge speed, efficiency, and support for the most advanced models available.
Here are some highlights of NexaML:
- Ease of use: One line of code after downloading the Nexa SDK, try it on your laptop powered by Snapdragon today, you can see Hexagon NPU acceleration right away
- Multimodal support: support multi-image and multi-audio inputs
- Structured decoding: Supports JSON output that enables function calling for agentic AI workflow
- Quantization: ~10% lower perplexity with NexaQuant
- Extended context length: 2× longer without speed loss
- Cross-platform: Supports PCs powered by Qualcomm technology, mobile devices, automotive platforms and IoT
- OpenAI-compatible API: for drop-in integration
Sign up for Developer monthly newsletter
Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.
Besides OmniNeural-4B, NexaML now supports these cutting-edge AI models to run on the Hexagon NPU:
- Qwen3 series → supported latest Qwen3-4B models and its fine-tuned versions
- YOLOv12 + PaddleOCR v4 → Latest SOTA vision models
OmniNeural’s reach goes far beyond individual users. Its architecture is scalable across a wide range of industries, from personal digital assistants that anticipate your needs, to enterprise and industrial workflows that demand real-time, intelligent automation. The potential is as vast as the imagination of those who build with it.
To showcase what OmniNeural can do, real-world demonstrations are already available. On smartphones powered by Snapdragon, the assistant both hears your voice and sees what you see, responding naturally and instantly, all while running directly on the Hexagon NPU for sustained battery life and lightning-fast interactions.
Meanwhile, the full PC powered by Snapdragon demo highlights OmniNeural paired with Snapdragon X Elite processor, where you can see multi-image reasoning and agentic AI capabilities—all executed locally on the device’s Hexagon NPU.
- AI PCs powered by Snapdragon → OmniNeural copilot reasoning locally (“spot the difference” images, poster + text function calls, and audio comparison)
- Smartphones powered by Snapdragon → Assistants that see and hear the world around you, fully private. On mobile, speak directly into the mic while showing an image, and OmniNeural+nexaML handles both inputs natively.
- Snapdragon Digital Chassis Solution → in-car copilots powered by Qwen3, responding in milliseconds.
- Qualcomm IoT & Qualcomm Robotics → Qwen3-enabled decision-making in edge robots powered by Qualcomm Dragonwing RB3 Gen2 HDK.
In coming weeks, Nexa AI will broaden its support to include additional models such as Parakeet, Gemma, and more.
Building on the strong collaboration between Qualcomm Technologies and Nexa AI, this expansion will unlock a new wave of on-device AI use cases—empowering developers and OEMs to create AI solutions that are faster, more efficient, and inherently private.
This innovation will seamlessly extend across PCs, smartphones, automotive platforms, and IoT devices, paving the way for smarter, more secure technology experiences everywhere.
Ready to dive in? OmniNeural‑4B is already live for developers. You can pull the model straight onto any Hexagon NPU enabled device and run it end‑to‑end.
Try the model directly on Hugging Face
Come for support, stay for the community
Get support from experts, connect with like-minded developers, and access exclusive virtual events.


