Developer Blog

Unlocking the power of Qualcomm QNN Execution Provider GPU backend for ONNX Runtime

Written by

John Paul

May 19, 2025

In the ever-evolving landscape of Generative AI and Deep Learning, diverse workloads necessitate the use of NPU, GPU, and CPU to achieve the best experience on edge platforms. At Qualcomm Technologies, we continuously enhance our AI software stack to enable optimal performance on devices with Snapdragon X Elite.

Overview

Current ONNX Runtime supports either Qualcomm® AI Engine Direct (QNN) Execution Provider (EP) NPU (Qualcomm HTP) or DirectML GPU stack. Today we are pleased to announce the preview of the ONNX Runtime Qualcomm® AI Engine Direct (QNN) EP with the Qualcomm Adreno GPU backend.

Choosing the Backend

The choice of the backend for the ONNX Runtime Qualcomm (ORT) QNN EP is specified during the inference session creation time. This is done through Qualcomm ORT QNN EP options. Other than this, there is no difference between the HTP backend and the GPU backend in terms of usage by client code.

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Sign up

Figure-1 shows workflow for Qualcomm ORT QNN GPU EP

# Create an ONNX Runtime session.
session = ort.InferenceSession(
    model_path,
    sess_options=options,
    providers=["QNNExecutionProvider"],
    provider_options=[{"backend_path": "QnnGpu.dll"}]
)

Python code

std::unordered_map qnn_options;
qnn_options["backend_path"] = "QnnGpu.dll";
Ort::SessionOptions session_options;
session_options.AppendExecutionProvider("QNN", qnn_options);
Ort::Session session(env, model_path.c_str(), session_options);

C++ code

Running Inference fully on Adreno GPU

To check if your model graph can fully run on the Qualcomm ORT QNN EP’s GPU backend, you can disable CPU fallback and run the model with the Qualcomm ORT QNN EP. If your model runs successfully, it indicates that it ran fully on the GPU backend. Below are examples of how this can be done:

# (Optional) Enable configuration that raises an exception if
# the model can't be run entirely on the QNN backend.
options.add_session_config_entry("session.disable_cpu_ep_fallback", "1")

Python code

session_options.AddConfigEntry("session.disable_cpu_ep_fallback", "1");

C++ code

OpenCL Driver Dependency

The GPU backend depends on the OpenCL driver installed on the system. Any driver at or after the preview version runs correctly with Qualcomm QNN SDK versions 2.33 and above.

For lower versions of the driver, a workaround is available by enabling verbose logging level in WCR. Below are examples of how this can be done:

qnn_options["log_severity_level"] = "0";

C++ code

# Set the logging level to Verbose for QnnGpu to work with older driver
ort.set_default_logger_severity(0)

Python code

Supported Models

The GPU backend is currently in its preview stages of enablement. The following models are validated on Qualcomm GPU:

resnet_50_fp16
resnet_50_fp32
google_vit_base_fp32
squeezenet1.0-7
mobilenetv2-7
emotion-ferplus-8
arcfaceresnet100-8
intel_bert_fp32
google_bert_fp32

Other models may run partially on the GPU, with the rest falling back to the CPU.

Exclusive Backend Choice

Please note that while the choice of EP is a priority order for subgraph execution, the choice of Qualcomm ORT QNN EP backend is an exclusive choice for the EP. This means it’s an "either or" choice.

If the HTP backend is chosen for the Qualcomm ORT QNN EP, then the GPU backend cannot be used for the Qualcomm ORT QNN EP at the same time for the same session.

Conclusion

We would like to receive feedback on the ONNX Runtime QNN GPU backend as we continue to enable more models, including LLMs.

Join a community of like minded developers on Qualcomm Developer Discord to benefit from real time discussions and exclusive virtual events.

Additional resources:

ONNX Runtime website

Qualcomm AI Engine Direct Execution Provider documentation on ONNX Runtime website

Upgraded Graphics Drivers for Snapdragon X Elite blog

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

Adreno Compute Windows on Snapdragon

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Author

John PaulSenior Staff Engineer