Bringing Granite 4.0 to the Edge: on-device AI for real-world Performance
Sign up for Developer monthly newsletter
Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.
Sign upCome for support, stay for the community
Get support from experts, connect with like-minded developers, and access exclusive virtual events.
Join Developer DiscordAs AI continues to evolve, the shift toward on-device intelligence is accelerating. IBM’s latest release, Granite 4.0, is a powerful example of how open-source models are being designed for real-world performance, with speed, flexibility, and efficiency at the forefront.
Granite 4.0 introduces a hybrid architecture that makes it ideal for edge deployments, whether you're building intelligent assistants or agentic workflows. And with the growing capabilities of Snapdragon and Qualcomm Dragonwing platforms, developers and end users now have the tools to run these models locally - without relying on cloud infrastructure.
Granite 4.0: Built for speed, optimized for flexibility
Granite 4.0 combines Mamba-2 and transformer-based components, with some models featuring a Mixture of Experts (MoE) routing strategy. This architecture delivers:
- 70%+ lower memory requirements and 2x faster inferencing speeds with maintained performance compared to similarly sized traditional models for long-context, large-batch tasks
- Unconstrained context length with support for long-context tasks (tested up to 128K tokens)
- Flexible deployment on a wide range of devices
With multiple model sizes and architectures, developers can choose the right fit for their hardware and use case:
- Granite 4.0 Micro (3B): Dense Transformer
- Granite 4.0 Micro Hybrid (3B): Dense Mamba-2 + Transformer
- Granite 4.0 Tiny (7B, 1B active): MoE Mamba-2 + Transformer
- Granite 4.0 Small (32B, 9B active): MoE Mamba-2 + Transformer
Snapdragon + Granite: a powerful edge AI combination
Snapdragon platforms are enabling a new generation of on-device AI experiences. With advanced heterogeneous computing, developers can run models like Granite 4.0 directly on devices powered by Snapdragon, bringing:
- Real-time responsiveness for user-facing and local data AI features
- Data privacy and security with local processing
- Cost savings by eliminating cloud inference fees
- Reliability even in offline or bandwidth-constrained environments
This edge-first approach empowers developers to build AI that’s fast, private, and scalable.
Nexa AI brings Granite 4.0 running directly on Qualcomm Hexagon NPU for compute and mobile platforms, Auto and IoT
We are partnering closely with Nexa AI to launch IBM Granite 4.0 as a fully optimized, on-device solution for the Qualcomm® Hexagon™ NPU—the dedicated AI inferencing core of Snapdragon platforms.
NexaML is engineered specifically for NPU-first inferencing, providing exceptional performance, advanced structured decoding, and robust extended context support. With immediate, day-zero compatibility for Snapdragon X2 Elite and Snapdragon 8 Elite Gen 5 platforms, Nexa empowers developers to deploy Granite 4.0 with unmatched efficiency, unlocking the fastest route to high-performance, private, and scalable edge AI.
Nexa AI currently supports the following Granite 4.0 model versions
- Granite-4.0-Micro (3B)
- Snapdragon X2 Elite NPU
- Snapdragon 8 Gen 5 NPU
- Snapdragon X Elite NPU
Developers can explore Nexa SDK for PCs with Snapdragon PCs.
Follow the instructions here to here to download Nexa SDK and Run Granite 4 on Qualcomm the NPU/GPU/CPU in the Snapdragon platforms with one line of code.
Learn more about Nexa ML, Hyperlink and other solutions
CPU and GPU inferencing partners: LM Studio, Ollama and Nexa AI
Qualcomm Technologies and IBM are continuing to work with partners like Ollama, LM Studio and Nexa AI to make Granite 4.0 accessible for developers across our other AI engine cores such as our GPU and CPU.
The partner platforms offer:
- Self-hosted model deployment
- Fine-tuning and adapter support
- Integration with local workflows and enterprise systems
Together, we’re enabling a future where AI is ubiquitous, personal, and under your control.
Try it yourself
Explore Granite 4.0 on Hugging Face
Interested in on-device AI?
Check out Qualcomm AI Hub, Qualcomm Technologies’ developer platform for edge AI. Whether you're looking to bring your own model or explore a collection of pre-optimized models for devices powered by Snapdragon and Dragonwing family, Qualcomm AI Hub makes it easier to build and deploy AI locally.


