Qualcomm launches the first ONNX Runtime Plugin Execution Provider
Sign up for Developer monthly newsletter
Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.
Sign upCome for support, stay for the community
Get support from experts, connect with like-minded developers, and access exclusive virtual events.
Join Developer Discord- The Qualcomm Plugin Execution Provider (EP) for ONNX Runtime enables easier updates, and more efficient deployment of AI workloads across Qualcomm platforms
- This results in more stable applications, and a smoother development workflow for developers who use ONNX runtime across compute, IoT and Android platforms
- Qualcomm Technologies is the first silicon provider to release a public plugin EP for ONNX Runtime, and the only who has open sourced the EP code, advancing the ONNX ecosystem as a whole
Qualcomm Technologies is proud to release the first publicly available Plugin Execution Provider (EP) for ONNX Runtime, powered by the Qualcomm AI Stack. This release marks a key milestone for the ONNX Runtime ecosystem, and developers who create AI-powered apps on Android, Windows and Linux.
With the Qualcomm Plugin EP for ONNX Runtime (ORT), developers benefit from easier updates, and more efficient deployment of AI workloads across Qualcomm platforms—spanning mobile, PCs, IoT, robotics, and more.
The Challenge: Scaling Execution Providers in a Heterogeneous World
First, let’s define EPs. EPs are the plugins that enable hardware acceleration by allowing ONNX Runtime to delegate the execution of specific model nodes or subgraphs to specialized hardware IPs, such as CPUs, GPUs, and NPUs.
The EP abstraction lets ONNX Runtime be hardware-agnostic — you write one inference call, and the EP handles the low-level dispatch to the right hardware IP without interacting with the hardware vendors native toolchains.
ONNX Runtime has a generic inference engine, but the execution provider is the plugin that decides where and how each model operator/subgraph runs — CPU, GPU, NPU, etc.
Historically, ONNX Runtime EPs have been built and distributed as part of the core ONNX Runtime repository. While this ensures tight integration, it also introduces challenges, such as:
- Tight coupling between EPs and ONNX Runtime release cycles, leading to delays for developers implementing system optimizations.
- Slower innovation due to upstream dependency constraints, resulting in less stable or under-performing apps
As the hardware ecosystem expands to include specific processors such as NPUs, GPUs, custom ASICs, and domain‑specific accelerators, this tightly coupled model no longer scales, demanding for a more flexible solution
The solution: Plugin EPs for hardware vendors
Plugin EPs introduce a novel architecture that decouples execution provider development from the ONNX Runtime core.
Instead of being compiled directly into ONNX Runtime, a Plugin EP is delivered as a dynamically loaded shared library that implements a stable EP interface. The Qualcomm Plugin EP for ONNX leads to:
- True modularity:
Plugin EPs can be developed, versioned, and released independently, while ONNX Runtime remains a stable platform. This means no changes to ORT Core and only updates to the Qualcomm Plugin EP with the new software features and optimizations.
- Faster innovation: Developers can access
Our performance optimizations, bug fixes, and expanded operator coverage as soon as they become available monthly without waiting for quarterly ONNX Runtime releases.
With a clean separation between ONNX Runtime and hardware-specific components, Plugin EPs enables independent release cycles, stable interfaces, and faster adoption of new hardware and SDK capabilities.
Introducing Qualcomm EP for ONNX: The First Public Plugin EP available
Qualcomm Technologies is the first silicon vendor to publicly release a plugin EP for ONNX Runtime. This release is significant not only for developers targeting Qualcomm platforms, but for the broader ONNX Runtime ecosystem.
Why this matters?
- Qualcomm Technologies demonstrates that the Plugin EP model works in real-world deployments
- We deliver production‑grade performance on platforms for mobile, PC and edge devices
- The EP validates long‑term ABI and API stability for external EPs
The Qualcomm Plugin EP enables ONNX models to run efficiently on Qualcomm AI platforms — while remaining fully decoupled from ONNX Runtime core code.
Try the Qualcomm Plugin EP for ONNX Runtime here
Benefits for ONNX developers
For developers building AI‑enabled applications using ONNX runtime, the Qualcomm Plugin EP significantly simplifies development, deployment, and long‑term maintenance.
Key benefits:
- No recompilation required:
Update or replace the Qualcomm Plugin EP without rebuilding ONNX Runtime or your application binary. - Independent release cycles:
Adopt Qualcomm Technologies performance improvements, bug fixes, and operator support updates faster—without waiting for ORT releases. - Future‑ready architecture:
Stable plugin interfaces reduce friction when upgrading ONNX Runtime versions or targeting new Qualcomm platforms. Developers can start building as soon as we release new hardware, without waiting for ORT releases.
This allows developers to spend less time managing infrastructure and more time building differentiated AI experiences.
For end users, this translates into tangible benefits every time they interact with an AI app, such as best performance, improved latency, responsiveness, and power efficiency—especially on mobile and edge devices
The Infrastructure Changes that made the Plugin EP possible
To support Plugin EP, ONNX Runtime introduced new infrastructure for external EP libraries, including:
- Stable, long‑term C‑based EP interfaces
- Dynamic discovery and loading of EP shared libraries
- Explicit provider registration at runtime
- Clear versioning contracts between ONNX Runtime and Plugin Eps
These changes enable:
- ONNX Runtime upgrades not to break existing Plugin EPs
- Plugin EPs to evolve independently
- Both parties to innovate without integration overhead
For developers, this results in fewer surprises and a much cleaner integration experience as mentioned above
More details are available here
Moving the ONNX Ecosystem Forward
By becoming the first silicon provider to publicly release a Plugin Execution Provider, Qualcomm Technologies is helping to set the foundation for a more modular, scalable, and innovative ONNX Runtime ecosystem, demonstrating hardware acceleration can be delivered faster, with more flexibility and lower integration overhead, facilitating developers workflows to develop AI apps with ease.
This is just the beginning—and we look forward to continued collaboration with the ONNX Runtime community and ecosystem partners.



