Purpose-built rack-scale AI inference with the industry's highest memory capacity per accelerator¹ — engineered for generative and agentic AI at scale.
Qualcomm Dragonfly™ AI200 is a purpose-built, rack-scale AI inference accelerator engineered for the generative and agentic AI era. Each AI200 card delivers an industry-leading 768 GB of LPDDR memory1, while a single 140 kW liquid-cooled ORv3-compliant rack provides industry-leading 43 TB of memory capacity1 — enabling efficient inference up to 10T parameter models with compelling total cost of ownership (TCO).
Features
- Rack-scale solution purpose-built for AI inference economics
- 768 GB LPDDR memory per card — highest per-accelerator capacity in the industry1
- Industry-leading 43 TB memory capacity per rack1
- Supports SOTA models up to 10T parameters with context lengths up to 128K
- PCIe Gen6 scale-up; Ethernet-with-RoCE scale-out
- Qualcomm® AI Inference Suite for bare-metal, VM, or IaaS deployment
- Designed to accelerate various model categories - reasoning, vision, agentic, LLM, text-to-image, and video processing
- Air and direct liquid cooling
- OCP ORv3-compliant rack with cableless backplane
- With LPDDR5x
