Hexagon‑MLIR is an open‑source AI compiler stack from Qualcomm Technologies, Inc. that lets you easily compile and run Triton kernels and PyTorch models on Hexagon NPUs—dedicated AI accelerators built for high‑performance, efficient AI and genAI workloads.
This initiative complements our commercial toolchains by exploring an open‑source MLIR‑based compilation stack, giving developers a path to advance AI compilation capabilities through a more flexible and transparent approach.
Why it matters?
As Generative AI evolves, researchers are inventing new operators and fusion patterns faster than programmers can hand-write libraries for them.
Hexagon-MLIR is a way to bridge this gap and provide agility. It is an open-source compilation stack that allows developers to bring custom Triton kernels and PyTorch graphs directly to the Hexagon NPU, overcoming the limitations of fixed-function libraries.
Sign up for Developer monthly newsletter
Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.
Getting Started
Please follow our user guide which provides instructions for setting up and installing Hexagon-MLIR compiler, and executing end-to-end tests on Hexagon NPUs (tested architectures - v73, v75, v79).
Then follow our tutorials which guide how to compile and follow the lowering process of Triton Kernels such as GELU, and Softmax; or PyTorch examples such as GPT2.
What works today
At a glance, the Early Developer Preview supports:
- Supported source inputs
Triton kernels and PyTorch models (via Torch‑MLIR) - Supported hardware
Single Hexagon NPU (tested on v73, v75, v79) - Major features working today
End‑to‑end lowering from Triton and PyTorch to executable binary that executes on Hexagon NPU, with features such as operator fusion, tiling for TCM, Hexagon Vector eXtensions (HVX) multi‑threading, double‑buffering to overlap computation with memory transfers, vectorization, HVX math library integration, and HexKL‑accelerated matrix multiplication.
Performance Expectations
Hexagon‑MLIR is focused on providing agility - allowing developers to bring custom Triton kernels and PyTorch graphs directly to the Hexagon NPU, overcoming the limitations of fixed-function libraries. As we continue to expand kernel coverage, improve optimization passes, and strengthen integration with the Hexagon NPU hardware features, we will share updated performance results as the project evolves.
Summary
We are excited to announce the open‑source release of our new AI compilation stack Hexagon‑MLIR. It lets you compile Triton kernels or PyTorch models and run them on Hexagon NPUs.
Hexagon‑MLIR is a work‑in‑progress, and we are continuously adding performance enhancements and new features. We want to give early access to developers and researchers and enable them to start using it now while we continue to improve the compiler and regularly publish updates to the open‑source repository.
Your Turn
Our relentless push towards embracing open-source and the importance that it lays in being developer first helped in shaping this important milestone of open-sourcing Hexagon-MLIR. Clone the repo, follow the user-guide to install, and dig into our tutorials to understand and explore further.
Curious about additional ways to deploy on-device AI? Visit Qualcomm AI stack hub to learn more.
Join our Developer Discord and let us know what you think!
Come for support, stay for the community
Get support from experts, connect with like-minded developers, and access exclusive virtual events.


