Developer Blog

Bringing Granite 4.0 to the Edge: on-device AI for real-world Performance

Written by

Neel Kishan

Written by

Rodrigo Caruso Neves do Amaral

Oct 2, 2025

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Sign up

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

As AI continues to evolve, the shift toward on-device intelligence is accelerating. IBM’s latest release, Granite 4.0, is a powerful example of how open-source models are being designed for real-world performance, with speed, flexibility, and efficiency at the forefront.

Granite 4.0 introduces a hybrid architecture that makes it ideal for edge deployments, whether you're building intelligent assistants or agentic workflows. And with the growing capabilities of Snapdragon and Qualcomm Dragonwing platforms, developers and end users now have the tools to run these models locally - without relying on cloud infrastructure.

Granite 4.0: Built for speed, optimized for flexibility

Granite 4.0 combines Mamba-2 and transformer-based components, with some models featuring a Mixture of Experts (MoE) routing strategy. This architecture delivers:

70%+ lower memory requirements and 2x faster inferencing speeds with maintained performance compared to similarly sized traditional models for long-context, large-batch tasks
Unconstrained context length with support for long-context tasks (tested up to 128K tokens)
Flexible deployment on a wide range of devices

With multiple model sizes and architectures, developers can choose the right fit for their hardware and use case:

Granite 4.0 Micro (3B): Dense Transformer
Granite 4.0 Micro Hybrid (3B): Dense Mamba-2 + Transformer
Granite 4.0 Tiny (7B, 1B active): MoE Mamba-2 + Transformer
Granite 4.0 Small (32B, 9B active): MoE Mamba-2 + Transformer

Snapdragon + Granite: a powerful edge AI combination

Snapdragon platforms are enabling a new generation of on-device AI experiences. With advanced heterogeneous computing, developers can run models like Granite 4.0 directly on devices powered by Snapdragon, bringing:

Real-time responsiveness for user-facing and local data AI features
Data privacy and security with local processing
Cost savings by eliminating cloud inference fees
Reliability even in offline or bandwidth-constrained environments

This edge-first approach empowers developers to build AI that’s fast, private, and scalable.

Nexa AI brings Granite 4.0 running directly on Qualcomm Hexagon NPU for compute and mobile platforms, Auto and IoT

We are partnering closely with Nexa AI to launch IBM Granite 4.0 as a fully optimized, on-device solution for the Qualcomm® Hexagon™ NPU—the dedicated AI inferencing core of Snapdragon platforms.

NexaML is engineered specifically for NPU-first inferencing, providing exceptional performance, advanced structured decoding, and robust extended context support. With immediate, day-zero compatibility for Snapdragon X2 Elite and Snapdragon 8 Elite Gen 5 platforms, Nexa empowers developers to deploy Granite 4.0 with unmatched efficiency, unlocking the fastest route to high-performance, private, and scalable edge AI.

Nexa AI currently supports the following Granite 4.0 model versions

Developers can explore Nexa SDK for PCs with Snapdragon PCs.

Follow the instructions here to here to download Nexa SDK and Run Granite 4 on Qualcomm the NPU/GPU/CPU in the Snapdragon platforms with one line of code.

Learn more about Nexa ML, Hyperlink and other solutions

CPU and GPU inferencing partners: LM Studio, Ollama and Nexa AI

Qualcomm Technologies and IBM are continuing to work with partners like Ollama, LM Studio and Nexa AI to make Granite 4.0 accessible for developers across our other AI engine cores such as our GPU and CPU.

The partner platforms offer:

Self-hosted model deployment
Fine-tuning and adapter support
Integration with local workflows and enterprise systems

Together, we’re enabling a future where AI is ubiquitous, personal, and under your control.

Try it yourself

Test Granite 4.0 on Ollama

Run Granite 4.0 in LM Studio

Explore Granite 4.0 on Hugging Face

Interested in on-device AI?

Check out Qualcomm AI Hub, Qualcomm Technologies’ developer platform for edge AI. Whether you're looking to bring your own model or explore a collection of pre-optimized models for devices powered by Snapdragon and Dragonwing family, Qualcomm AI Hub makes it easier to build and deploy AI locally.

Edge AI AI

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Authors

Neel KishanNeel leads go-to-market strategy and partnerships for Qualcomm AI Hub