Developer Blog

Compile Triton & PyTorch for Hexagon NPU with Open Source Hexagon‑MLIR

Written by

Muthu Baskaran

Written by

Mohammed Javed Absar

Feb 6, 2026

Hexagon‑MLIR is an open‑source AI compiler stack from Qualcomm Technologies, Inc. that lets you easily compile and run Triton kernels and PyTorch models on Hexagon NPUs—dedicated AI accelerators built for high‑performance, efficient AI and genAI workloads.

This initiative complements our commercial toolchains by exploring an open‑source MLIR‑based compilation stack, giving developers a path to advance AI compilation capabilities through a more flexible and transparent approach.

Why it matters?

As Generative AI evolves, researchers are inventing new operators and fusion patterns faster than programmers can hand-write libraries for them.

Hexagon-MLIR is a way to bridge this gap and provide agility. It is an open-source compilation stack that allows developers to bring custom Triton kernels and PyTorch graphs directly to the Hexagon NPU, overcoming the limitations of fixed-function libraries.

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Sign up

Figure 1 : Hexagon-MLIR Compilation Stack

Getting Started

Please follow our user guide which provides instructions for setting up and installing Hexagon-MLIR compiler, and executing end-to-end tests on Hexagon NPUs (tested architectures - v73, v75, v79).

Then follow our tutorials which guide how to compile and follow the lowering process of Triton Kernels such as GELU, and Softmax; or PyTorch examples such as GPT2.

What works today

At a glance, the Early Developer Preview supports:

Supported source inputs
Triton kernels and PyTorch models (via Torch‑MLIR)
Supported hardware
Single Hexagon NPU (tested on v73, v75, v79)
Major features working today
End‑to‑end lowering from Triton and PyTorch to executable binary that executes on Hexagon NPU, with features such as operator fusion, tiling for TCM, Hexagon Vector eXtensions (HVX) multi‑threading, double‑buffering to overlap computation with memory transfers, vectorization, HVX math library integration, and HexKL‑accelerated matrix multiplication.

Performance Expectations

Hexagon‑MLIR is focused on providing agility - allowing developers to bring custom Triton kernels and PyTorch graphs directly to the Hexagon NPU, overcoming the limitations of fixed-function libraries. As we continue to expand kernel coverage, improve optimization passes, and strengthen integration with the Hexagon NPU hardware features, we will share updated performance results as the project evolves.

Summary

We are excited to announce the open‑source release of our new AI compilation stack Hexagon‑MLIR. It lets you compile Triton kernels or PyTorch models and run them on Hexagon NPUs.

Hexagon‑MLIR is a work‑in‑progress, and we are continuously adding performance enhancements and new features. We want to give early access to developers and researchers and enable them to start using it now while we continue to improve the compiler and regularly publish updates to the open‑source repository.

Your Turn

Our relentless push towards embracing open-source and the importance that it lays in being developer first helped in shaping this important milestone of open-sourcing Hexagon-MLIR. Clone the repo, follow the user-guide to install, and dig into our tutorials to understand and explore further.

Curious about additional ways to deploy on-device AI? Visit Qualcomm AI stack hub to learn more.

Join our Developer Discord and let us know what you think!

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

NPU Open Source LLVM Hexagon PyTorch

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Qualcomm-branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Authors

Muthu BaskaranDirector, Engineering, Qualcomm Technologies, Inc.