Back to All
Developer Blog

OmniNeural-4B & NexaML: innovating Multimodal AI on Qualcomm Hexagon NPU

How Qualcomm and Nexa AI are driving NPU-First Innovation that unlocks natural, local, and scalable intelligence everywhere

Qualcomm Technologies, Inc. stands at the forefront of on-device AI, relentlessly pushing the boundaries of what’s possible. Every day, we witness groundbreaking advances in on-device intelligence from visionary AI companies across the globe.

Among these trailblazers is Nexa.AI, with OmniNeural-4B—a revolutionary multimodal AI model— that has been engineered from the ground up to run natively on the Hexagon NPU. This powerful synergy is ushering in a new era of on-device intelligence, empowering devices with unprecedented capabilities right at the edge.

OmniNeural-4B stands apart as the world’s first NPU-aware multi-modal model engineered to seamlessly understand and process text, voice, and vision together. Unlike conventional AI models that depend only on cloud connectivity, OmniNeural harnesses the on-device AI performance from Hexagon NPUs, ushering in a new reality where AI is both local and deeply integrated.

Whether you’re using a laptop, smartphone, automotive, XR or IoT device, this model delivers state-of-the-art AI performance right where you need it—with all the on-device AI benefits like immediacy, cost-efficiency, reliability and privacy.

Unlocking Multimodal AI Performance on Device

OmniNeural-4B is designed for high-quality natural language, vision, and audio understanding, with advanced text generation capability, while being optimized for performance on device. OEMs and Developers can deliver:

  • Smarter mobile apps: Private, on-device agents that see, hear, and understand personal context—powering chat, translation, and creative tools that respond instantly without the cloud.

  • Intelligent IoT devices: From smart homes to factory floors, devices reason across signals in real time to detect patterns, guide actions, and adapt naturally at the edge.

  • Next-gen automotive experiences: Voice copilots that recognize drivers and surroundings, personalize cabin experiences, and assist with tasks—all directly on the vehicle’s compute platform.

  • Productivity and creativity on PCs: Local AI that summarizes, creates, and automates workflows with richer context and stronger privacy—delivering results with lower latency, powered by Snapdragon X Series compute platforms

Introducing NexaML: Revolutionizing On-Device AI Inferencing

NexaML is a breakthrough solution designed for the new era of on-device AI, offering a seamless experience that combines the simplicity of cloud APIs with the power of local AI model inferencing. By running models directly on NPUs, NexaML delivers state-of-the-art performance and full compatibility with the latest advancements in AI.

Built to harness the full capabilities of the Qualcomm AI Stack, NexaML leverages Qualcomm AI Engine direct (QNN) for direct access to the Hexagon NPU, unlocking unprecedented performance from your device. With NexaML, developers can deploy sophisticated AI applications locally, benefiting from cutting-edge speed, efficiency, and support for the most advanced models available.

Here are some highlights of NexaML:

  • Ease of use: One line of code after downloading the Nexa SDK, try it on your laptop powered by Snapdragon today, you can see Hexagon NPU acceleration right away

  • Multimodal support: support multi-image and multi-audio inputs

  • Structured decoding: Supports JSON output that enables function calling for agentic AI workflow

  • Quantization: ~10% lower perplexity with NexaQuant

  • Extended context length: 2× longer without speed loss

  • Cross-platform: Supports PCs powered by Qualcomm technology, mobile devices, automotive platforms and IoT

  • OpenAI-compatible API: for drop-in integration

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

nexaML interface
Figure 1. nexaML interface

Besides OmniNeural-4B, NexaML now supports these cutting-edge AI models to run on the Hexagon NPU:

  • Qwen3 series → supported latest Qwen3-4B models and its fine-tuned versions

  • YOLOv12 + PaddleOCR v4 → Latest SOTA vision models

OmniNeural’s reach goes far beyond individual users. Its architecture is scalable across a wide range of industries, from personal digital assistants that anticipate your needs, to enterprise and industrial workflows that demand real-time, intelligent automation. The potential is as vast as the imagination of those who build with it.

To showcase what OmniNeural can do, real-world demonstrations are already available. On smartphones powered by Snapdragon, the assistant both hears your voice and sees what you see, responding naturally and instantly, all while running directly on the Hexagon NPU for sustained battery life and lightning-fast interactions.

Meanwhile, the full PC powered by Snapdragon demo highlights OmniNeural paired with Snapdragon X Elite processor, where you can see multi-image reasoning and agentic AI capabilities—all executed locally on the device’s Hexagon NPU. 

In coming weeks, Nexa AI will broaden its support to include additional models such as Parakeet, Gemma, and more.

Building on the strong collaboration between Qualcomm Technologies and Nexa AI, this expansion will unlock a new wave of on-device AI use cases—empowering developers and OEMs to create AI solutions that are faster, more efficient, and inherently private.

This innovation will seamlessly extend across PCs, smartphones, automotive platforms, and IoT devices, paving the way for smarter, more secure technology experiences everywhere.

Ready to dive in? OmniNeural‑4B is already live for developers. You can pull the model straight onto any Hexagon NPU enabled device and run it end‑to‑end.

Try the model directly on Hugging Face

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Authors
Vinesh Sukumar
Vinesh SukumarVP, Product Management of AI/GenAI, Qualcomm Technologies, Inc.
Jerry Chang
Jerry ChangSenior Manager, Marketing, Qualcomm Technologies
Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.