Sep 22, 2021
Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.
Editor's Note: The included figures were updated 09/24/2021
The latest MLCommons™, MLPerf™ v1.1 Inference testing results are in, and a new frontrunner has emerged in one of the most comprehensive machine learning performance benchmarks.
With industry leading advancements in performance density and performance-per-watt capabilities, the Qualcomm Cloud AI 100 platforms are leading in all scorecards of the latest benchmark submissions.
MLCommons is an open engineering consortium that has been working in the direction of creating machine learning benchmarks for training and inference testing platforms. The tests involve AI inference in data center environments in both offline and server query-driven scenarios and in Edge servers and devices in offline and single-stream scenarios.
Qualcomm Technologies has significantly expanded its submission to MLPerf benchmarks. It has doubled the number of platform submissions from Edge to Cloud. The network coverage has expanded to include language processing (BERT) and added SSD MobileNet v1 to the vision networks. It total, 82 benchmarks results were submitted, including 36 power results.
As AI and ML accelerate industrywide mass deployments, it is becoming very evident that the solutions must offer a better value proposition in addition to highest performance. Inference-per-Second-per-Watt (I/S/W) is emerging as the most important benchmark for deployments that provide the best value-to-service for providers and end users. Qualcomm Technologies has reinforced its leadership in power efficiency with its MLPerf v1.1 submission. On servers configured with 8x Qualcomm Cloud AI 100 accelerators, Qualcomm Technologies has demonstrated highest 197 I/S/W for ResNet-50.
All other Qualcomm submissions for datacenter and edge also demonstrates leading Inference-per-Second-per-Watt benchmarks.
Packing more AI performance in servers and datacenter server racks at low power is equally important for total cost of ownership. Qualcomm Technologies’ denser 16x Qualcomm Cloud AI 100 accelerator submission with a compact 2U Gigabyte G292-Z43 rack server powered with dual-socket 3rd Gen AMD EPYC server processors has delivered highest offline at 342011 inferences/second and highest from the server at 310064 inferences/second, under 1.84KW total system power. A single datacenter rack configured with 20x such servers can provide 6.84+ million inferences per second, while also achieving industry-leading power efficiency at that scale.
Qualcomm Technologies has demonstrated that it has scaled the performance linearly from 1x, to 5x, to 8x to 16x Qualcomm Cloud AI 100 AI accelerators across all the submitted networks in Edge and Cloud categories. The scaling equally applies to server scenarios for all datacenter submissions with very high server/offline ratios (above 90 percent in most cases), showcasing maximum datacenter server efficiency.
Qualcomm Technologies has showcased a very low power Cloud AI 100 AI Edge Development Kit (AEDK) delivering maximum performance per watt. The measured efficiency has now increased 240 Inf/sec/Watt for Resnet50.
Continuing to lead offline peak performance into Edge category, Qualcomm Technologies has delivered highest offline peak performance on AI Edge servers and device.
Qualcomm Technologies has made a very strong BERT submission with its 75W Accelerator. It has achieved very high performance per watt at 75W TDP compared to many other submissions in similar or double the TDP value.
Qualcomm Technologies worked with partner Krai for the MLPerf v1.1 benchmark submission and all submissions are powered by Collective Knowledge v1.55.5 automation suite for MLPerf. The configurations for the test platforms are as follows: The datacenter submission uses a Gigabyte R282 and G292 2U Server host with 8x and 16x Qualcomm Cloud AI 100 PCIe HHHL 75W TDP accelerators respectively. Meanwhile, the Edge submissions use a Gigabyte-R282 server configured with 5x and 1x Qualcomm Cloud AI 100 cards. The AEDK with the Snapdragon 865 Modular Platforms feature Qualcomm Cloud AI 100 DM.2e configured for 15W TDP, and Cloud AI 100 DM.2 configured for 20W TDP.
Qualcomm Technologies’ MLPerf v1.1 submissions delivered additional networks and platforms and led the pack in peak performance in very dense datacenter server solutions along with Edge results at the best performance per Watt, thanks to the Qualcomm Cloud AI 100.
Performance and power efficiency are crucial metrics for an organization of any size, as every joule spent impacts your bottom line. Datacenters need to not only be powerful but also provide high efficiency for the lowest Total Cost of Ownership (TCO). Products that can deliver the highest performance at the lowest power are most critical.
The Qualcomm Cloud AI 100 accelerator outperformed competition in the Inferencing Datacenter and Edge charts with the highest Inference Performance Density (at lowest power) among all MLPerf submissions.
Beyond Qualcomm Technologies’ performance leadership, the Qualcomm Cloud AI 100 platform also takes the lead in one of the most important metrics in Edge AI inferencing: latency performance. Low latency at low energy is vital to achieving the fastest response time to enable the best possible user experience. The Qualcomm Cloud AI 100 platform delivers the most advanced Performance Per Watt capabilities with the lowest latency (ms) at the lowest energy (joules) among all Edge devices with power submissions.
The MLPerf v1.1 Power submission again demonstrates Qualcomm Cloud AI 100 as the platform of choice for all AI inferencing applications for both the Edge and datacenter categories by delivering the highest number of inferences at lowest latency and lowest energy utilization. The Qualcomm Cloud AI 100 provides a unique blend of high computational performance, low latency and low power utilization and is well suited for a broad range of applications ranging from Edge to Cloud.