OnQ Blog

The Qualcomm Cloud AI 100 emerges as the fastest AI inference solution in the world

Sep 22, 2021

Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.

Editor's Note: The included figures were updated 09/24/2021


The latest MLCommons™, MLPerf™ v1.1 Inference testing results are in, and a new frontrunner has emerged in one of the most comprehensive machine learning performance benchmarks.

With industry leading advancements in performance density and performance-per-watt capabilities, the Qualcomm Cloud AI 100 platforms are leading in all scorecards of the latest benchmark submissions.

MLCommons is an open engineering consortium that has been working in the direction of creating machine learning benchmarks for training and inference testing platforms. The tests involve AI inference in data center environments in both offline and server query-driven scenarios and in Edge servers and devices in offline and single-stream scenarios. 

Qualcomm Technologies has significantly expanded its submission to MLPerf benchmarks. It has doubled the number of platform submissions from Edge to Cloud. The network coverage has expanded to include language processing (BERT) and added SSD MobileNet v1 to the vision networks. It total, 82 benchmarks results were submitted, including 36 power results.

As AI and ML accelerate industrywide mass deployments, it is becoming very evident that the solutions must offer a better value proposition in addition to highest performance. Inference-per-Second-per-Watt (I/S/W) is emerging as the most important benchmark for deployments that provide the best value-to-service for providers and end users. Qualcomm Technologies has reinforced its leadership in power efficiency with its MLPerf v1.1 submission. On servers configured with 8x Qualcomm Cloud AI 100 accelerators, Qualcomm Technologies has demonstrated highest 197 I/S/W for ResNet-50.

Figure 1: Most efficient AI inference solution - ResNet-50
Figure 1: Most efficient AI inference solution - ResNet-50

All other Qualcomm submissions for datacenter and edge also demonstrates leading Inference-per-Second-per-Watt benchmarks.

Packing more AI performance in servers and datacenter server racks at low power is equally important for total cost of ownership. Qualcomm Technologies’ denser 16x Qualcomm Cloud AI 100 accelerator submission with a compact 2U Gigabyte G292-Z43 rack server powered with dual-socket 3rd Gen AMD EPYC server processors has delivered highest offline at 342011 inferences/second and highest from the server at 310064 inferences/second, under 1.84KW total system power. A single datacenter rack configured with 20x such servers can provide 6.84+ million inferences per second, while also achieving industry-leading power efficiency at that scale. 

Figure 2: Highest offline peak AI performance
Figure 2: Highest offline peak AI performance

Qualcomm Technologies has demonstrated that it has scaled the performance linearly from 1x, to 5x, to 8x to 16x Qualcomm Cloud AI 100 AI accelerators across all the submitted networks in Edge and Cloud categories. The scaling equally applies to server scenarios for all datacenter submissions with very high server/offline ratios (above 90 percent in most cases), showcasing maximum datacenter server efficiency.

Figure 3: Performance scaling with Cloud AI 100
Figure 3: Performance scaling with Cloud AI 100

Qualcomm Technologies has showcased a very low power Cloud AI 100 AI Edge Development Kit (AEDK) delivering maximum performance per watt. The measured efficiency has now increased 240 Inf/sec/Watt for Resnet50.

Figure 4: Most efficient AI Edge solutions
Figure 4: Most efficient AI Edge solutions

Continuing to lead offline peak performance into Edge category, Qualcomm Technologies has delivered highest offline peak performance on AI Edge servers and device.

Figure 5: Highest peak performance
Figure 5: Highest peak performance

Qualcomm Technologies has made a very strong BERT submission with its 75W Accelerator. It has achieved very high performance per watt at 75W TDP compared to many other submissions in similar or double the TDP value.

Figure 6: Best efficiency in natural language processing BERT
Figure 6: Best efficiency in natural language processing BERT

Qualcomm Technologies worked with partner Krai for the MLPerf v1.1 benchmark submission and all submissions are powered by Collective Knowledge v1.55.5 automation suite for MLPerf. The configurations for the test platforms are as follows: The datacenter submission uses a Gigabyte R282 and G292 2U Server host with 8x and 16x Qualcomm Cloud AI 100 PCIe HHHL 75W TDP accelerators respectively. Meanwhile, the Edge submissions use a Gigabyte-R282 server configured with 5x and 1x Qualcomm Cloud AI 100 cards. The AEDK with the Snapdragon 865 Modular Platforms feature Qualcomm Cloud AI 100 DM.2e configured for 15W TDP, and Cloud AI 100 DM.2 configured for 20W TDP. 

Qualcomm Technologies’ MLPerf v1.1 submissions delivered additional networks and platforms and led the pack in peak performance in very dense datacenter server solutions along with Edge results at the best performance per Watt, thanks to the Qualcomm Cloud AI 100. 

Performance and power efficiency are crucial metrics for an organization of any size, as every joule spent impacts your bottom line. Datacenters need to not only be powerful but also provide high efficiency for the lowest Total Cost of Ownership (TCO). Products that can deliver the highest performance at the lowest power are most critical. 

The Qualcomm Cloud AI 100 accelerator outperformed competition in the Inferencing Datacenter and Edge charts with the highest Inference Performance Density (at lowest power) among all MLPerf submissions. 

Beyond Qualcomm Technologies’ performance leadership, the Qualcomm Cloud AI 100 platform also takes the lead in one of the most important metrics in Edge AI inferencing: latency performance. Low latency at low energy is vital to achieving the fastest response time to enable the best possible user experience. The Qualcomm Cloud AI 100 platform delivers the most advanced Performance Per Watt capabilities with the lowest latency (ms) at the lowest energy (joules) among all Edge devices with power submissions. 

The MLPerf v1.1 Power submission again demonstrates Qualcomm Cloud AI 100 as the platform of choice for all AI inferencing applications for both the Edge and datacenter categories by delivering the highest number of inferences at lowest latency and lowest energy utilization. The Qualcomm Cloud AI 100 provides a unique blend of high computational performance, low latency and low power utilization and is well suited for a broad range of applications ranging from Edge to Cloud. 


Qualcomm Cloud AI and Snapdragon are products of Qualcomm Technologies, Inc. and/or its subsidiaries.


Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Mike Vildibill

Vice President, Product Management, Qualcomm Technologies

John Kehrli

Senior Director, Product Management, Qualcomm Technologies