OnQ Blog

Qualcomm Centriq 2400 Processor: Designed for scalability and throughput performance on cloud datacenter workloads

Oct 5, 2017

Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.

When Qualcomm Datacenter Technologies unveiled the details of the Qualcomm Falkor CPU core in August, we discussed the market shift to a cloud-based computing model and how datacenter infrastructure is being optimized to address the demand for scalable performance under the unique characteristics of cloud software and services. Falkor, our fully custom core built specifically for the cloud datacenter market, was designed for optimal throughput performance and efficiency on today’s multi-threaded cloud workloads. Falkor serves as the scalable building block for the Qualcomm Centriq 2400 Processor, the world’s first 10nm server processor, which will begin shipping commercially later this year.

SoC architectures for cloud-based workloads must provide a balance of aggregate throughput performance and performance-per-watt efficiency. In addition, they must be designed for compute density and predictable performance in order to perform well in highly-loaded and multi-tenant environments. When developing the highly scalable 48-core Qualcomm Centriq 2400 SoC, we employed the same “built for the cloud” design philosophy from Falkor to all the other foundational elements of the SoC. Preliminary estimates based on internal testing show integer throughput performance comparable to Intel Xeon Platinum Series at significantly lower power.

At the 2017 Linley Processor Conference this week, we will share additional details about the SoC foundational elements and how they address the needs of cloud datacenter workloads:

  • Highly integrated server SoC: The Qualcomm Centriq 2400 SoC was designed using a scalable architecture to maximize efficiency and performance for throughput-oriented workloads. This single chip platform-level solution obviates the real estate, power, and cost of a separate chipset for I/O. The SoC is ARM SBSA Level 3 Compliant to help simplify development and deployment by our ecosystem partners and customers.
  • Qualcomm Falkor core as a building block: Our processor design team has a rich history of delivering high-performance, yet power-efficient, custom ARM CPUs for mobile platforms, and has brought this world-class design expertise to architect a CPU core specifically designed to support the features and performance demands of cloud service providers. Falkor is AArch64 only and fully ARMv8 compliant. The Falkor core duplex includes two custom Falkor CPUs, a shared 512 KB L2 cache with ECC (SEC/DED), and a shared system bus interface.
  • Scalable on-chip interconnect: The Qualcomm Centriq 2400 SoC includes a high-bandwidth and low-latency bi-directional segmented ring bus that utilizes a Qualcomm proprietary protocol. The multi-ring architecture and interconnect protocol are built for SoC scalability and outstanding throughput performance with capabilities such as full coherency (cache and I/O), shortest path routing, and multicast on read.
  • Distributed L3 Cache: The SoC includes a distributed 60MB non-inclusive/non-exclusive L3 Cache (12 x 5MB) with ECC (SEC/DED) that is 20-way set associative. The memory address is hashed across all 12 L3 cache blocks to evenly distribute accesses and smooth out access latencies. The memory subsystem includes innovative shared resource management techniques such as L3 Quality of Service (QoS) to improve cache utilization, reduce application latency, and manage cache resource bandwidth. Resources can be managed by virtual machine, container, or thread groups.
  • Scalable Multi-channel DDR: The memory subsystem includes six 64-bit DDR4 memory controllers with ECC (SEC/DEC). The SoC supports RDIMM or LRDIMM with one or two DIMMs per channel and memory speeds up to 2667 MT/second. The controllers have full out-of-order execution with memory addresses hashed across all DDRs. The design includes a proprietary algorithm for memory bandwidth enhancement via in-line and transparent memory compression. Memory compression is performed on a cache line granularity and delivers up to 50% compression and up to 2x memory bandwidth on highly compressible data.
  • Distributed IOMMUs: Distributed IO Memory Management Units (IOMMUs) provide address translation and access control with shared/distributed virtual memory support. Each major IO function (PCIe, DMA, SATA, etc.) includes dedicated instances to eliminate resource contention and enable concurrent page table lookup/translation for maximum I/O throughput and concurrency.

The Qualcomm Datacenter Technologies product roadmap is tailored to the emerging demands of highly-scalable, performant, and power-efficient servers that will fuel the next wave of cloud datacenters. We look forward to beginning commercial shipments of the Qualcomm Centriq 2400 — the world’s first 10nm server processor — by the end of 2017.


Qualcomm Falkor and Qualcomm Centriq are products of Qualcomm Datacenter Technologies, Inc.


Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Ram Peddibhotla

Vice President, Product Management, Qualcomm Datacenter Technologies, Inc.

©2021 Qualcomm Technologies, Inc. and/or its affiliated companies.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes Qualcomm's licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm's engineering, research and development functions, and substantially all of its products and services businesses. Qualcomm products referenced on this page are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell any of the components or devices referenced herein.