Data Center

AI Accelerators

Inference‑first architecture built for hyperscale

Qualcomm Technologies is redefining data center infrastructure with rack-scale inference solutions optimized for the agentic AI era. While legacy architectures were built for raw compute speed, the AI era demands a focus on extreme energy efficiency, real-time responsiveness, and total cost of ownership to maximize tokens per watt and tokens per dollar.

Innovative architecture for exponential effective bandwidth

Qualcomm® High Bandwidth Compute (HBC) delivers a generational leap in effective memory bandwidth, addressing the inference decode bottleneck and enabling token generation at lower power, latency, and cost.

18x

Effective memory bandwidth
(AI250 with HBC Gen 1 vs AI200)

54x

Effective memory bandwidth
(AI300 with HBC Gen 2 vs AI200)

Qualcomm High
Bandwidth Compute

By fusing memory and compute dies, HBC processes extreme memory bandwidth within the package and skips energy-intensive data movement to SoC for operations with low arithmetic intensity. HBC eliminates the HBM tax and enables industry-leading tokens-per-watt and tokens-per-dollar.

Diagram of QCM memory architecture, showing computation happening within memory, and data moving to the SoC only when necessary.

Compared to HBM

The traditional HBM approach transmits high bandwidth data through the entire path from DRAM arrays to the SoC, a model that frequently hits the "memory wall". This architecture relies on expensive 2.5D packaging and results in higher costs per gigabyte, more power consumed, and less effective memory bandwidth compared to HBC.

Diagram of HBM memory architecture, showing data moving to the SoC for every operation.

Key ingredients for
rack-scale AI inference

Near-memory computing

HBC has a multi-year roadmap to deliver faster, more power-efficient, and more scalable processing at lower TCO compared to HBM.

Purpose-built neural processing

The Qualcomm Hexagon™ NPU technology delivers industry-leading performance and energy efficiency to process the most demanding AI tasks.

Unrivaled compute density

Our custom Qualcomm Oryon™ CPU technology, purpose-built for servers, delivers industry-leading power efficiency, performance, and rack density.

High-speed connectivity & interconnects

A comprehensive portfolio of advanced connectivity solutions optimized for scale‑up, scale‑out, and scale-across.

Advanced nodes

Leading-edge process at 3nm and below for high performance and low power.

Leading performance-per-watt

Our holistic systems approach and co-design across AI, compute and connectivity lead to enable rack-level power efficiency.

Bring innovative efficiency to AI performance

Our holistic full-stack architecture for AI racks leverages decades of expertise in power efficiency, performance optimization, complex systems design, and memory management to deliver scalable AI infrastructure that optimizes total cost of ownership (TCO) at hyperscale.

Orchestration software

Coordinates multi-stage AI inference workflows and aligns them to the right infrastructure resources via the Qualcomm Dragonfly AI Infrastructure Management Suite (AI MS).

Connectivity

The "connectivity tissue" provided by Qualcomm Dragonfly high-speed connectivity protocols.

Core infrastructure

Our efficient and custom-designed AI inference solutions are based on our differentiated IP, including Hexagon NPU and Qualcomm Oryon CPU technology scaled for data center workloads, along with HBC to address the “memory wall.”

AI performance engineered for hyperscaler infrastructure

Inference for hyperscalers is bottlenecked by memory bandwidth, memory capacity and data movement energy costs. Our architecture optimizes for token generation efficiency, prioritizing low latency consistency, power efficiency, and unit economics over theoretical peak performance.

Tokens / (second * watt)

up to8x

better than GPU based systems1

Memory bandwidth / watt  

6x

higher than HBM based systems2

Memory capacity / watt

200x

better than SRAM based systems3

Performance / watt

>2x

better than the latest server CPU systems4

Effective memory bandwitdh

54x

higher per card (AI300 with HBC Gen 2 vs AI200)

1 With Qualcomm HBC; Qualcomm estimates compared to contemporary GPU-based architectures on decode performance for select models.
2 With Qualcomm HBC; Qualcomm estimates compared to competing published product specifications normalized at card-level.
3 With Qualcomm HBC; Qualcomm estimates compared to competing published product specifications normalized at rack-level.
4 With Qualcomm Dragonfly C1000 CPU; Qualcomm estimates compared to existing product benchmarks for server CPU competitive offerings based on specs.​

AI Accelerator Products

Qualcomm Cloud AI 100 Ultra

Inference accelerator optimized for generative AI and LLM workloads.

548 GB/s

Memory bandwidth per card

 

128 GB LPDDR4X

Memory capacity per card

Qualcomm Dragonfly AI200

Rack‑scale inference platform with superior memory capacity and low TCO.

414 TB/s

Memory bandwidth per rack

 

43 TB LPDDR5X

Memory capacity per rack

Qualcomm Dragonfly AI250

Introduces Qualcomm High Bandwidth Compute for disaggregated inference workloads.

7.4 PB/s

Effective memory bandwidth per rack with HBC Gen 1

 

43 TB

Memory capacity per rack

Qualcomm Dragonfly AI300

3rd generation rack-level AI inference platform with increased bandwidth and deployment options.

3x AI250

Effective memory bandwidth per rack with HBC Gen 2

 

3x AI250

Memory capacity per rack

Qualcomm Dragonfly AI Infrastructure Management Suite

Orchestrate and scale efficiency

The suite provides provisioning, monitoring, orchestration and fault handling across rack‑scale deployments. Together, hardware, connectivity and software form the foundation of a cohesive data center platform approach — one designed to scale with customers as AI workloads evolve.

More software tools


 

Qualcomm® AI Inference Suite

A comprehensive set of ready-to-use AI applications, agents, tools, and libraries for developing and deploying AI inference on premises or via cloud deployments. 


 

Qualcomm® Cloud AI SDK

Optimize trained deep learning models for high-performance inference on Qualcomm Dragonfly AI accelerators.

 

Contact Sales

We have an extensive technology portfolio designed to enable products across industries. Our sales team is ready to provide you with tailored assistance.

Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.