Back to All
Developer Blog

Unlocking the power of Qualcomm QNN Execution Provider GPU backend for ONNX Runtime

In the ever-evolving landscape of Generative AI and Deep Learning, diverse workloads necessitate the use of NPU, GPU, and CPU to achieve the best experience on edge platforms. At Qualcomm Technologies, we continuously enhance our AI software stack to enable optimal performance on devices with Snapdragon X Elite.

Overview

Current ONNX Runtime supports either Qualcomm® AI Engine Direct (QNN) Execution Provider (EP) NPU (Qualcomm HTP) or DirectML GPU stack. Today we are pleased to announce the preview of the ONNX Runtime Qualcomm® AI Engine Direct (QNN) EP with the Qualcomm Adreno GPU backend.

Choosing the Backend

The choice of the backend for the ONNX Runtime Qualcomm (ORT) QNN EP is specified during the inference session creation time. This is done through Qualcomm ORT QNN EP options. Other than this, there is no difference between the HTP backend and the GPU backend in terms of usage by client code.

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

shows workflow for Qualcomm ORT QNN GPU EP
Figure-1 shows workflow for Qualcomm ORT QNN GPU EP
# Create an ONNX Runtime session.
session = ort.InferenceSession(
    model_path,
    sess_options=options,
    providers=["QNNExecutionProvider"],
    provider_options=[{"backend_path": "QnnGpu.dll"}]
)
Python code
std::unordered_map qnn_options;
qnn_options["backend_path"] = "QnnGpu.dll";
Ort::SessionOptions session_options;
session_options.AppendExecutionProvider("QNN", qnn_options);
Ort::Session session(env, model_path.c_str(), session_options);
C++ code

Running Inference fully on Adreno GPU

To check if your model graph can fully run on the Qualcomm ORT QNN EP’s GPU backend, you can disable CPU fallback and run the model with the Qualcomm ORT QNN EP. If your model runs successfully, it indicates that it ran fully on the GPU backend. Below are examples of how this can be done:

# (Optional) Enable configuration that raises an exception if
# the model can't be run entirely on the QNN backend.
options.add_session_config_entry("session.disable_cpu_ep_fallback", "1")
Python code
session_options.AddConfigEntry("session.disable_cpu_ep_fallback", "1");
C++ code

OpenCL Driver Dependency

The GPU backend depends on the OpenCL driver installed on the system. Any driver at or after the preview version runs correctly with Qualcomm QNN SDK versions 2.33 and above.

For lower versions of the driver, a workaround is available by enabling verbose logging level in WCR. Below are examples of how this can be done:

qnn_options["log_severity_level"] = "0";
C++ code
# Set the logging level to Verbose for QnnGpu to work with older driver
ort.set_default_logger_severity(0)
Python code

Supported Models

The GPU backend is currently in its preview stages of enablement. The following models are validated on Qualcomm GPU:

  • resnet_50_fp16
  • resnet_50_fp32
  • google_vit_base_fp32
  • squeezenet1.0-7
  • mobilenetv2-7
  • emotion-ferplus-8
  • arcfaceresnet100-8
  • intel_bert_fp32
  • google_bert_fp32

Other models may run partially on the GPU, with the rest falling back to the CPU.

Exclusive Backend Choice

Please note that while the choice of EP is a priority order for subgraph execution, the choice of Qualcomm ORT QNN EP backend is an exclusive choice for the EP. This means it’s an "either or" choice.

If the HTP backend is chosen for the Qualcomm ORT QNN EP, then the GPU backend cannot be used for the Qualcomm ORT QNN EP at the same time for the same session.

Conclusion

We would like to receive feedback on the ONNX Runtime QNN GPU backend as we continue to enable more models, including LLMs. 

Join a community of like minded developers on Qualcomm Developer Discord to benefit from real time discussions and exclusive virtual events.

Additional resources:

ONNX Runtime website 

Qualcomm AI Engine Direct Execution Provider documentation on ONNX Runtime website

Upgraded Graphics Drivers for Snapdragon X Elite blog

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Author
John Paul
John PaulSenior Staff Engineer
Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.