Back to All
Developer Blog

Bringing Granite 4.0 to the Edge: on-device AI for real-world Performance

Sign up for Developer monthly newsletter-image

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Sign up
Come for support, stay for the community-image

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

As AI continues to evolve, the shift toward on-device intelligence is accelerating. IBM’s latest release, Granite 4.0, is a powerful example of how open-source models are being designed for real-world performance, with speed, flexibility, and efficiency at the forefront.

Granite 4.0 introduces a hybrid architecture that makes it ideal for edge deployments, whether you're building intelligent assistants or agentic workflows. And with the growing capabilities of Snapdragon and Qualcomm Dragonwing platforms, developers and end users now have the tools to run these models locally - without relying on cloud infrastructure.

Granite 4.0: Built for speed, optimized for flexibility

Granite 4.0 combines Mamba-2 and transformer-based components, with some models featuring a Mixture of Experts (MoE) routing strategy. This architecture delivers:

  • 70%+ lower memory requirements and 2x faster inferencing speeds with maintained performance compared to similarly sized traditional models for long-context, large-batch tasks

  • Unconstrained context length with support for long-context tasks (tested up to 128K tokens)

  • Flexible deployment on a wide range of devices

With multiple model sizes and architectures, developers can choose the right fit for their hardware and use case:

  • Granite 4.0 Micro (3B): Dense Transformer
  • Granite 4.0 Micro Hybrid (3B): Dense Mamba-2 + Transformer
  • Granite 4.0 Tiny (7B, 1B active): MoE Mamba-2 + Transformer
  • Granite 4.0 Small (32B, 9B active): MoE Mamba-2 + Transformer

Snapdragon + Granite: a powerful edge AI combination

Snapdragon platforms are enabling a new generation of on-device AI experiences. With advanced heterogeneous computing, developers can run models like Granite 4.0 directly on devices powered by Snapdragon, bringing:

  • Real-time responsiveness for user-facing and local data AI features

  • Data privacy and security with local processing

  • Cost savings by eliminating cloud inference fees

  • Reliability even in offline or bandwidth-constrained environments

This edge-first approach empowers developers to build AI that’s fast, private, and scalable.

Nexa AI brings Granite 4.0 running directly on Qualcomm Hexagon NPU for compute and mobile platforms, Auto and IoT

We are partnering closely with Nexa AI to launch IBM Granite 4.0 as a fully optimized, on-device solution for the Qualcomm® Hexagon™ NPU—the dedicated AI inferencing core of Snapdragon platforms.

NexaML is engineered specifically for NPU-first inferencing, providing exceptional performance, advanced structured decoding, and robust extended context support. With immediate, day-zero compatibility for Snapdragon X2 Elite and Snapdragon 8 Elite Gen 5 platforms, Nexa empowers developers to deploy Granite 4.0 with unmatched efficiency, unlocking the fastest route to high-performance, private, and scalable edge AI.

Nexa AI currently supports the following Granite 4.0 model versions

Developers can explore Nexa SDK for PCs with Snapdragon PCs.

Follow the instructions here to here to download Nexa SDK and Run Granite 4 on Qualcomm the NPU/GPU/CPU in the Snapdragon platforms with one line of code.

Learn more about Nexa ML, Hyperlink and other solutions

CPU and GPU inferencing partners: LM Studio, Ollama and Nexa AI

Qualcomm Technologies and IBM are continuing to work with partners like Ollama, LM Studio and Nexa AI to make Granite 4.0 accessible for developers across our other AI engine cores such as our GPU and CPU.

The partner platforms offer:

  • Self-hosted model deployment
  • Fine-tuning and adapter support
  • Integration with local workflows and enterprise systems

Together, we’re enabling a future where AI is ubiquitous, personal, and under your control.

Try it yourself

Test Granite 4.0 on Ollama

Run Granite 4.0 in LM Studio

Explore Granite 4.0 on Hugging Face

Interested in on-device AI?

Check out Qualcomm AI Hub, Qualcomm Technologies’ developer platform for edge AI. Whether you're looking to bring your own model or explore a collection of pre-optimized models for devices powered by Snapdragon and Dragonwing family, Qualcomm AI Hub makes it easier to build and deploy AI locally.

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Authors
Neel Kishan
Neel KishanNeel leads go-to-market strategy and partnerships for Qualcomm AI Hub
Rodrigo Caruso Neves do Amaral
Rodrigo Caruso Neves do AmaralBusiness Development Manager, Qualcomm Technologies, Inc.
Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.