Qualcomm Dragonfly AI250
Active
Qualcomm Dragonfly™
AI250
Contact Sales
chip image

Rack-scale AI inference with breakthrough memory bandwidth — engineered for the real-time agentic AI and low TCO.

Qualcomm Dragonfly™ AI250, our second-generation, rack-scale AI inference platform introduces the groundbreaking Qualcomm® High Bandwidth Compute (HBC) Gen 1 — enabling 133 TB/s of effective memory bandwidth per card (18x more than AI200). Combining massive memory bandwidth and capacity enables fast, low-latency inference on SOTA models up to 10T parameters and context lengths to 1M tokens, all within an ORv3 compliant air- and direct liquid-cooled rack.

Built for hyperscale agentic AI, it enables disaggregated inferencing at superior token-per-watt and token-per-dollar.

Benefits

default icon alt text
Real-Time Token Generation
With industry-leading 133 TB/s effective memory bandwidth per card1 addressing memory-bound AI inference decode, HBC enables fast token generation.
default icon alt text
Low Latency per Token
Massive effective memory bandwidth keeps decode pipelines fed, reducing latency per token for more responsive real-time agentic AI experiences.
default icon alt text
Large Context Window
With 768 GB per card, agentic workloads with long reasoning chains are designed to minimize context eviction.
default icon alt text
Total Cost Advantage at Scale
AI250 is engineered to deliver superior total cost of ownership (TCO) versus competition at iso tokens-per-second-per-user — making real-time inference economically viable at scale.
default icon alt text
Superior Tokens per Watt
4x-8x better performance per watt compared to contemporary GPU-based architectures on memory bandwidth per watt per card.2
default icon alt text
Deploy AI Faster
Qualcomm AI Inference Suite streamlines deployment across bare metal, cloud VMs, and inference-as-a-service, helping teams move from model to production faster.

1Compared to competing published product specifications normalized at card- and rack-level

2Qualcomm estimates compared to contemporary GPU-based architectures on memory bandwidth per watt per card

Features

  • Industry-leading effective memory bandwidth1 to deliver compelling cost per token on real-time inferencing on SOTA models up to 10T and context lengths up to 1M.
  • Designed for memory-bound and real-time inference at hyperscale economics: LLM decode, reasoning, agentic AI, and multimodal generation
  • HBC architecture for industry-leading performance per watt2
  • 43 TB memory capacity and 7.4 PB/s effective bandwidth with HBC per rack
  • Over 6 TB of HBC memory per server, capable of supporting over 10T parameter model, reducing networking dependency.
  • PCIe Gen6 scale-up; Ethernet-with-RoCE scale-out
  • Qualcomm AI Inference Suite for bare-metal, VM, or IaaS deployment
  • Rack-scale solution with liquid cooling, storage, network switches, and NICs
  • Air and direct liquid cooling
  • OCP ORv3-compliant rack with cableless backplane
Specifications
Rack
Form Factor
Single Wide, Open Rack v3 (ORv3) compliant
Number of Cards
56
Memory
Capacity
43 TB
Effective Bandwidth
7.455 PB/s1
Scale-Up
Interface Type
PCIe 6.0
Scale-Out
Interface Type
Ethernet w/RoCE
Thermal Management
Cooling
Direct Liquid Cooling (DLC), Air Cooling
Thermal Design Power
140 kW
  1. With HBC Gen 1 (18x compared to AI200)

QUALCOMM DRAGONFLY AI INFRASTRUCTURE MANAGEMENT SUITE

Orchestrate and
scale efficiency

The suite provides provisioning, monitoring, orchestration and fault handling across rack‑scale deployments. Together, hardware, connectivity and software form the foundation of a cohesive data center platform approach — one designed to scale with customers as AI workloads evolve. 

Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.