Developer Blog

NexaSDK for Android: a simple way to bring on-device AI to smartphones with Snapdragon

Written by

Alan Zhu

Written by

Zack Li

Written by

Alex Chen

Written by

Manoj Khilnani

Written by

Jerry Chang

Dec 2, 2025

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Sign up

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

Artificial intelligence (AI) features are becoming standard in Android apps but getting them to run well on a phone has usually required a lot of custom work. Nexa AI and Qualcomm Technologies, Inc. have now made this easier with NexaSDK for Android.

The software development kit (SDK) lets apps tap into the Qualcomm Hexagon NPU (the preferred processing engine for AI), as well as the Qualcomm Adreno GPU and Qualcomm Oryon CPU, on Snapdragon mobile platforms, giving developers an easy-to-use, single runtime that handles the heavy lifting so they can focus on building actual product features.

In Nexa's early numbers, on Samsung S25 Ultra powered by Snapdragon 8 Elite, models like Granite 4.0-h-350M reach 92 token per second on the NPU, compared to 40 tokens per second on the CPU.

Energy measurments show up to 9 times better efficiency when workloads run on the NPU instead of the CPU.

These gains make a real difference in everyday use, especially for apps that need
to stay responsive while keeping battery impact low.

How NexaSDK works with Snapdragon hardware

Snapdragon platforms combine three processing engines. When it comes to classic AI and generative AI, the Hexagon NPU is the engine of choice to run AI workloads. It is equipped with industry-leading and dedicated scalar, vector and tensor accelerators and a large shared memory.

Precision-wise, the latest Snapdragon 8 Elite Gen 5 can support INT2, INT4, INT8, INT16, FP8 and FP16. The combination of these features results in flexibility to process diverse AI workloads, energy efficiency, and memory and bandwidth savings.

For those who seek more fine tuning across the NPU, CPU and GPU, the Nexa SDK allows developers to choose their preferred engine for inferencing, without the challenges of device-specific tuning. Nexa SDK supports OpenCL and Vulcan APIs for easy access.

The Qualcomm Oyron CPU is general purpose and can be utilized for latency critical AI tasks.

Getting the most out of this mix typically requires device-specific tuning, complex configuration, and switching between different toolkits.

Nexa SDK removes this complexity by providing a unified interface: developers can simply select their preferred backend and, with just three lines of code, leverage the NPU, GPU, or CPU to run state-of-the-art models—including embedding, rerank, ASR, OCR, LLM, and VLM—directly on mobile devices.

Figure 1. NexaSDK for Android

Running SOTA models with minimal code

The SDK is built to keep setup simple. Loading and running a model only takes a few lines. The first beta supports a solid set of models that cover many common app needs:

Multimodal
- OmniNeural-4B from Nexa AI
LLM
- GPT-OSS-20B from OpenAI
- Granite-4.0-h-350M, Granite-4.0-Micro from IBM
- LFM2-1.2B from Liquid AI
- Gemma-3n-E2B, Gemman-3n-E4B from Google
- Qwen3-4B from Alibaba Qwen
- Phi3.5-mini, Phi4-mini from Microsoft
- Llama3.2-3B from Meta
Embeddings
- EmbedNeural from Nexa AI
- EmbeddingGemma from Google
ASR
- Parakeet-v3 from NVIDIA
- ConvNeXt from Meta
OCR
- PaddleOCR from Baidu
Reranking
- jina-reranker from Jina AI

Among them, GPT-OSS-20B has become a standout choice for enterprise customers. Sam Altman recently said that it offers “real-world performance comparable to o4-mini—and you can run it locally on your phone.”

However, many believed running a 20B-parameter model on mobile devices was still years away. Now, Nexa SDK enables GPT-OSS-20B to run entirely on-device with phones powered by Snapdragon processor equipped with Qualcomm Hexagon NPU phones (≥16GB RAM) through the Nexa Android SDK, providing private, low-latency inference with no cloud dependency.

Nexa also focuses on Day-0 availability for new models so developers don’t have to wait for backend or operator update and can start developing right away.

Benefits of running models on-device

Moving AI inference to the device brings several benefits that matter in shipping products:

No dependency on network conditions
Better privacy because data stays local
No cloud token or API charges
Reliable performance offline
Improved performance and battery life when using the NPU

These advantages are especially useful for assistants, translation tools, optical character recognition (OCR) and document apps, camera and imaging pipelines, and lightweight LLM features meant to run interactively on the phone.

A quick start for Android developer teams

The SDK includes a sample app and starter guide, so most developers can get a test build running quickly. It’s a practical option for teams experimenting with multimodal input, voice workflows, OCR, summarization, reranking or any feature that benefits from low-latency inference.

You can follow the QuickStart section of the Nexa SDK documentation and try the sample app. There’s also a tutorial video demonstrating how to run multimodal inference with the Nexa SDK within 40 seconds in Android Studio.

Nexa AI will continue refining the framework, and Qualcomm Technologies will keep adding capabilities to the Qualcomm AI Stack to further optimize for power efficient on-device.

Together, they give Android developers a more direct path to on-device intelligence without having to build their own runtime.

NexaAI-Android-SDK-video-demo

Nov 26, 2025 | 0:57

Video Player is loading.

Current Time 0:00

Duration 0:56

Loaded: 10.55%

Stream Type LIVE

Remaining Time 0:56

Start building on-device AI

If you want to bring AI features directly onto Android phones powered by Snapdragon, NexaSDK for Android (Beta) is an easy way to begin.

Download NexaSDK today

Nexa SDK has now earned over 6,000 GitHub stars, and our developer community is proactively expanding the ecosystem with derivative projects, such as the Android-focused nexa_ai_flutter package.

Browse Nexa AI and Qualcomm Technologies GitHub communities

More useful resources:

NexaSDK documentation

NexaSDK Android sample app

NexaSDK onboarding tutorial video

AI SDK Developer Tools Snapdragon

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Authors

Alan ZhuSenior Product Manager, Qualcomm

Zack LiSr. Staff Engineer, Qualcomm

Alex ChenPrincipal Engineer/Manager, Qualcomm

Manoj Khilnani Director of Global Partner Marketing

Jerry ChangSenior Manager, Marketing, Qualcomm Technologies