AI Accelerators
Inference‑first architecture built for hyperscale
Qualcomm Technologies is redefining data center infrastructure with rack-scale inference solutions optimized for the agentic AI era. While legacy architectures were built for raw compute speed, the AI era demands a focus on extreme energy efficiency, real-time responsiveness, and total cost of ownership to maximize tokens per watt and tokens per dollar.
Innovative architecture for exponential effective bandwidth
Qualcomm® High Bandwidth Compute (HBC) delivers a generational leap in effective memory bandwidth, addressing the inference decode bottleneck and enabling token generation at lower power, latency, and cost.
18x
Effective memory bandwidth (AI250 with HBC Gen 1 vs AI200)
54x
Effective memory bandwidth (AI300 with HBC Gen 2 vs AI200)
Qualcomm High Bandwidth Compute
By fusing memory and compute dies, HBC processes extreme memory bandwidth within the package and skips energy-intensive data movement to SoC for operations with low arithmetic intensity. HBC eliminates the HBM tax and enables industry-leading tokens-per-watt and tokens-per-dollar.
Compared to HBM
The traditional HBM approach transmits high bandwidth data through the entire path from DRAM arrays to the SoC, a model that frequently hits the "memory wall". This architecture relies on expensive 2.5D packaging and results in higher costs per gigabyte, more power consumed, and less effective memory bandwidth compared to HBC.
Key ingredients for rack-scale AI inference
Near-memory computing
HBC has a multi-year roadmap to deliver faster, more power-efficient, and more scalable processing at lower TCO compared to HBM.
Purpose-built neural processing
The Qualcomm Hexagon™ NPU technology delivers industry-leading performance and energy efficiency to process the most demanding AI tasks.
Unrivaled compute density
Our custom Qualcomm Oryon™ CPU technology, purpose-built for servers, delivers industry-leading power efficiency, performance, and rack density.
High-speed connectivity & interconnects
A comprehensive portfolio of advanced connectivity solutions optimized for scale‑up, scale‑out, and scale-across.
Advanced nodes
Leading-edge process at 3nm and below for high performance and low power.
Leading performance-per-watt
Our holistic systems approach and co-design across AI, compute and connectivity lead to enable rack-level power efficiency.
Bring innovative efficiency to AI performance
Our holistic full-stack architecture for AI racks leverages decades of expertise in power efficiency, performance optimization, complex systems design, and memory management to deliver scalable AI infrastructure that optimizes total cost of ownership (TCO) at hyperscale.
AI performance engineered for hyperscaler infrastructure
Inference for hyperscalers is bottlenecked by memory bandwidth, memory capacity and data movement energy costs. Our architecture optimizes for token generation efficiency, prioritizing low latency consistency, power efficiency, and unit economics over theoretical peak performance.
Tokens / (second * watt)
up to8x
better than GPU based systems1
Memory bandwidth / watt
6x
higher than HBM based systems2
Memory capacity / watt
200x
better than SRAM based systems3
Performance / watt
>2x
better than the latest server CPU systems4
Effective memory bandwitdh
54x
higher per card (AI300 with HBC Gen 2 vs AI200)
1 With Qualcomm HBC; Qualcomm estimates compared to contemporary GPU-based architectures on decode performance for select models.
2 With Qualcomm HBC; Qualcomm estimates compared to competing published product specifications normalized at card-level.
3 With Qualcomm HBC; Qualcomm estimates compared to competing published product specifications normalized at rack-level.
4 With Qualcomm Dragonfly C1000 CPU; Qualcomm estimates compared to existing product benchmarks for server CPU competitive offerings based on specs.
AI Accelerator Products
An innovative roadmap backed by long‑term investment
Qualcomm Cloud AI 100 Ultra
Inference accelerator optimized for generative AI and LLM workloads.
548 GB/s
Memory bandwidth per card
128 GB LPDDR4X
Memory capacity per card
Qualcomm Dragonfly AI200
Rack‑scale inference platform with superior memory capacity and low TCO.
414 TB/s
Memory bandwidth per rack
43 TB LPDDR5X
Memory capacity per rack
Qualcomm Dragonfly AI250
Introduces Qualcomm High Bandwidth Compute for disaggregated inference workloads.
7.4 PB/s
Effective memory bandwidth per rack with HBC Gen 1
43 TB
Memory capacity per rack
Qualcomm Dragonfly AI300
3rd generation rack-level AI inference platform with increased bandwidth and deployment options.
3x AI250
Effective memory bandwidth per rack with HBC Gen 2
3x AI250
Memory capacity per rack
Qualcomm Dragonfly AI Infrastructure Management Suite
Orchestrate and scale efficiency
The suite provides provisioning, monitoring, orchestration and fault handling across rack‑scale deployments. Together, hardware, connectivity and software form the foundation of a cohesive data center platform approach — one designed to scale with customers as AI workloads evolve.
More software tools
Qualcomm® AI Inference Suite
A comprehensive set of ready-to-use AI applications, agents, tools, and libraries for developing and deploying AI inference on premises or via cloud deployments.
Qualcomm® Cloud AI SDK
Optimize trained deep learning models for high-performance inference on Qualcomm Dragonfly AI accelerators.
Contact Sales
We have an extensive technology portfolio designed to enable products across industries. Our sales team is ready to provide you with tailored assistance.
