Rack-Scale AI Inference Accelerator AI200

Active

Qualcomm Dragonfly™

AI200

Contact Sales

On this page

Introduction

Benefits

Features / Specs

Purpose-built rack-scale AI inference with the industry's highest memory capacity per accelerator¹ — engineered for generative and agentic AI at scale.

Qualcomm Dragonfly™ AI200 is a purpose-built, rack-scale AI inference accelerator engineered for the generative and agentic AI era. Each AI200 card delivers an industry-leading 768 GB of LPDDR memory¹, while a single 140 kW liquid-cooled ORv3-compliant rack provides industry-leading 43 TB of memory capacity¹ — enabling efficient inference up to 10T parameter models with compelling total cost of ownership (TCO).

Product license agreement

Best-in-class Memory Capacity¹

With 768 GB per card and 43 TB per rack, AI200 serves models from 7B up to 10 trillion parameters, supporting long-context, RAG, and agentic workloads without compromise.

Rack-Scale Deployment

A fully integrated OCP ORv3-compliant, direct-liquid-cooled rack with cableless backplane simplifies deployment, accelerates time-to-production, and improves reliability across enterprise and hyperscale data centers.

Leading Total Cost of Ownership

Purpose-built for inference economics, AI200 delivers compelling performance-per-dollar, driving down cost-per-token for generative and agentic AI workloads.

Complete Inference Software Stack

The Qualcomm® AI Inference Suite provides end-to-end tooling for model deployment as bare-metal, cloud VM, or inference-as-a-service — reducing complexity and accelerating time-to-value.

Direct Liquid-Cooled

A 140 kW direct-liquid-cooled OCP ORv3-compliant rack maximizes compute density and energy efficiency, lowering power usage effectiveness (PUE) and enabling sustainable AI infrastructure at scale.

¹Compared to commercially-available competing published product specifications.

Features

Rack-scale solution purpose-built for AI inference economics
768 GB LPDDR memory per card — highest per-accelerator capacity in the industry¹
Industry-leading 43 TB memory capacity per rack¹
Supports SOTA models up to 10T parameters with context lengths up to 128K
PCIe Gen6 scale-up; Ethernet-with-RoCE scale-out
Qualcomm^® AI Inference Suite for bare-metal, VM, or IaaS deployment
Designed to accelerate various model categories - reasoning, vision, agentic, LLM, text-to-image, and video processing
Air and direct liquid cooling
OCP ORv3-compliant rack with cableless backplane

Specifications

Rack

Form Factor

Single Wide, Open Rack v3 (ORv3) compliant

Number of Cards

Memory

Bandwidth

0.414 PB/s1

Capacity

43 TB

Scale-Up

Interface Type

PCIe 6.0

Scale-Out

Interface Type

Ethernet w/RoCE

Thermal Management

Cooling

Direct Liquid Cooling (DLC), Air Cooling

Thermal Design Power

140 kW

With LPDDR5x

QUALCOMM DRAGONFLY AI INFRASTRUCTURE MANAGEMENT SUITE

Orchestrate and
scale efficiency

The suite provides provisioning, monitoring, orchestration and fault handling across rack‑scale deployments. Together, hardware, connectivity and software form the foundation of a cohesive data center platform approach — one designed to scale with customers as AI workloads evolve.

Learn more