Rack-scale AI inference with breakthrough memory bandwidth — engineered for the real-time agentic AI and low TCO.
Qualcomm Dragonfly™ AI250, our second-generation, rack-scale AI inference platform introduces the groundbreaking Qualcomm® High Bandwidth Compute (HBC) Gen 1 — enabling 133 TB/s of effective memory bandwidth per card (18x more than AI200). Combining massive memory bandwidth and capacity enables fast, low-latency inference on SOTA models up to 10T parameters and context lengths to 1M tokens, all within an ORv3 compliant air- and direct liquid-cooled rack.
Built for hyperscale agentic AI, it enables disaggregated inferencing at superior token-per-watt and token-per-dollar.
Features
- Industry-leading effective memory bandwidth1 to deliver compelling cost per token on real-time inferencing on SOTA models up to 10T and context lengths up to 1M.
- Designed for memory-bound and real-time inference at hyperscale economics: LLM decode, reasoning, agentic AI, and multimodal generation
- HBC architecture for industry-leading performance per watt2
- 43 TB memory capacity and 7.4 PB/s effective bandwidth with HBC per rack
- Over 6 TB of HBC memory per server, capable of supporting over 10T parameter model, reducing networking dependency.
- PCIe Gen6 scale-up; Ethernet-with-RoCE scale-out
- Qualcomm AI Inference Suite for bare-metal, VM, or IaaS deployment
- Rack-scale solution with liquid cooling, storage, network switches, and NICs
- Air and direct liquid cooling
- OCP ORv3-compliant rack with cableless backplane
- With HBC Gen 1 (18x compared to AI200)
