NexaSDK for Android: a simple way to bring on-device AI to smartphones with Snapdragon
Sign up for Developer monthly newsletter
Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.
Sign upCome for support, stay for the community
Get support from experts, connect with like-minded developers, and access exclusive virtual events.
Join Developer DiscordArtificial intelligence (AI) features are becoming standard in Android apps but getting them to run well on a phone has usually required a lot of custom work. Nexa AI and Qualcomm Technologies, Inc. have now made this easier with NexaSDK for Android.
The software development kit (SDK) lets apps tap into the Qualcomm Hexagon NPU (the preferred processing engine for AI), as well as the Qualcomm Adreno GPU and Qualcomm Oryon CPU, on Snapdragon mobile platforms, giving developers an easy-to-use, single runtime that handles the heavy lifting so they can focus on building actual product features.
In Nexa's early numbers, on Samsung S25 Ultra powered by Snapdragon 8 Elite, models like Granite 4.0-h-350M reach 92 token per second on the NPU, compared to 40 tokens per second on the CPU.
Energy measurments show up to 9 times better efficiency when workloads run on the NPU instead of the CPU.
These gains make a real difference in everyday use, especially for apps that need
to stay responsive while keeping battery impact low.
How NexaSDK works with Snapdragon hardware
Snapdragon platforms combine three processing engines. When it comes to classic AI and generative AI, the Hexagon NPU is the engine of choice to run AI workloads. It is equipped with industry-leading and dedicated scalar, vector and tensor accelerators and a large shared memory.
Precision-wise, the latest Snapdragon 8 Elite Gen 5 can support INT2, INT4, INT8, INT16, FP8 and FP16. The combination of these features results in flexibility to process diverse AI workloads, energy efficiency, and memory and bandwidth savings.
For those who seek more fine tuning across the NPU, CPU and GPU, the Nexa SDK allows developers to choose their preferred engine for inferencing, without the challenges of device-specific tuning. Nexa SDK supports OpenCL and Vulcan APIs for easy access.
The Qualcomm Oyron CPU is general purpose and can be utilized for latency critical AI tasks.
Getting the most out of this mix typically requires device-specific tuning, complex configuration, and switching between different toolkits.
Nexa SDK removes this complexity by providing a unified interface: developers can simply select their preferred backend and, with just three lines of code, leverage the NPU, GPU, or CPU to run state-of-the-art models—including embedding, rerank, ASR, OCR, LLM, and VLM—directly on mobile devices.
Running SOTA models with minimal code
The SDK is built to keep setup simple. Loading and running a model only takes a few lines. The first beta supports a solid set of models that cover many common app needs:
- Multimodal
- OmniNeural-4B from Nexa AI
- LLM
- GPT-OSS-20B from OpenAI
- Granite-4.0-h-350M, Granite-4.0-Micro from IBM
- LFM2-1.2B from Liquid AI
- Gemma-3n-E2B, Gemman-3n-E4B from Google
- Qwen3-4B from Alibaba Qwen
- Phi3.5-mini, Phi4-mini from Microsoft
- Llama3.2-3B from Meta
- Embeddings
- EmbedNeural from Nexa AI
- EmbeddingGemma from Google
- ASR
- Parakeet-v3 from NVIDIA
- ConvNeXt from Meta
- OCR
- PaddleOCR from Baidu
- PaddleOCR from Baidu
- Reranking
- jina-reranker from Jina AI
Among them, GPT-OSS-20B has become a standout choice for enterprise customers. Sam Altman recently said that it offers “real-world performance comparable to o4-mini—and you can run it locally on your phone.”
However, many believed running a 20B-parameter model on mobile devices was still years away. Now, Nexa SDK enables GPT-OSS-20B to run entirely on-device with phones powered by Snapdragon processor equipped with Qualcomm Hexagon NPU phones (≥16GB RAM) through the Nexa Android SDK, providing private, low-latency inference with no cloud dependency.
Nexa also focuses on Day-0 availability for new models so developers don’t have to wait for backend or operator update and can start developing right away.
Benefits of running models on-device
Moving AI inference to the device brings several benefits that matter in shipping products:
- No dependency on network conditions
- Better privacy because data stays local
- No cloud token or API charges
- Reliable performance offline
- Improved performance and battery life when using the NPU
These advantages are especially useful for assistants, translation tools, optical character recognition (OCR) and document apps, camera and imaging pipelines, and lightweight LLM features meant to run interactively on the phone.
A quick start for Android developer teams
The SDK includes a sample app and starter guide, so most developers can get a test build running quickly. It’s a practical option for teams experimenting with multimodal input, voice workflows, OCR, summarization, reranking or any feature that benefits from low-latency inference.
You can follow the QuickStart section of the Nexa SDK documentation and try the sample app. There’s also a tutorial video demonstrating how to run multimodal inference with the Nexa SDK within 40 seconds in Android Studio.
Nexa AI will continue refining the framework, and Qualcomm Technologies will keep adding capabilities to the Qualcomm AI Stack to further optimize for power efficient on-device.
Together, they give Android developers a more direct path to on-device intelligence without having to build their own runtime.
NexaAI-Android-SDK-video-demo
Nov 26, 2025 | 0:57

Start building on-device AI
If you want to bring AI features directly onto Android phones powered by Snapdragon, NexaSDK for Android (Beta) is an easy way to begin.
Nexa SDK has now earned over 6,000 GitHub stars, and our developer community is proactively expanding the ecosystem with derivative projects, such as the Android-focused nexa_ai_flutter package.
Browse Nexa AI and Qualcomm Technologies GitHub communities
More useful resources:
NexaSDK onboarding tutorial video





