AI On-Prem Appliance

Qualcomm Dragonwing^TM

The Dragonwing AI On-Prem Appliance delivers cloud-grade AI performance with the efficiency, security, and control of local deployment. Powered by Qualcomm Technologies' cutting-edge hardware technologies and software capabilities, these high-performance systems are engineered to transform operations across industries—without the risks or dependencies of cloud-only infrastructure.

Get started developing with Qualcomm AI On-Prem appliances. From ready-to-use, pre‑compiled models to optimized computer vision and large language models, the resources below help you move from evaluation to production faster—no cloud required.

Product license agreement

Useful LinksAI On-Prem Appliance Product Brief

Use-case with SocratiQ

Forums

Benefits

Cloud-grade AI on your premises

Up to 870 TOPS AI performance, delivered locally for maximum speed, security, and control—no cloud required.

Lower TCO and higher efficiency

More than 3x performance/cost and nearly 2x performance/watt compared to other cloud-based AI products.

Enhanced data control & security

Adopt AI on your terms: private, powerful, scalable, and fully under your control.

Rapid, easy deployment

Enables multi-user, multi-tenancy support for broader AI adoption enterprise-wide deployment with no IT overhaul.

Enterprise intelligence with smarter performance per dollar — at a fraction of the footprint.

Delivers industry-leading performance while optimizing for power efficiency and total cost of ownership— in a easily deployable form factor.

Dragonwing AI On-Prem Appliance

01:48

1:48

Video Player is loading.

Current Time 0:00

Duration 1:47

Loaded: 5.52%

Stream Type LIVE

Remaining Time 1:47

The Dragonwing AI on-prem solution integrates hardware and software into a unified, end-to-end solution.

HARDWARE

AI Accelerator Card

Purpose-built AI accelerators deliver high performance, low latency, and efficient on-site processing—all within a compact, power-optimized footprint.

View the catalog

SOFTWARE

Models & AI Inference Suite

A flexible software layer powers easy deployment, orchestration, and inference for computer vision, LLMs, and other AI models—scalable to meet enterprise needs.

Qualcomm AI Inference Suite

View LLMs

View CV Models

Enterprise-ready AI inference use cases

Dynamic information access

Deliver real-time access to maintenance guides, process documentation, and live records—right at the edge. Workers get instant, AI-powered support to troubleshoot, follow procedures, and boost productivity without relying on cloud connectivity.

Operational diagnostics

Just snap a photo—leverage AI inference combining computer vision and LLMs to analyze machine health, detect early signs of wear, and answer operational questions instantly.

Guided repair execution

Capture images of faulty equipment and receive dynamic, step-by-step repair guidance. Computer vision identifies components, LLMs interpret issues, and agentic AI adapts instructions in real time—enabling faster, more accurate fixes with minimal supervision.

Vision analytics

Deploy computer vision at the edge to monitor operations, detect anomalies, and deliver real-time, scalable insights from visual data across enterprise environments.

Multilingual translation

Run LLMs and other AI models to translate documents across languages and convert scanned data to structured text via OCR—accelerating part identification, procurement, and global supply chain workflows with real-time, on-prem inference.

On-site skill gap support

Democratize AI adoption by empowering every worker with real-time, contextual guidance, bridging skill gaps and enabling consistent execution across roles and experience levels.

Blazing memory bandwidth

544 GBps memory bandwidth enables ultra-fast inference speeds for demanding workloads.

Large model ready

128 GB memory supports running larger models, including multi-modal and complex LLMs.

Lower power

150W power envelope delivers high compute efficiency with low energy footprint.

LLM powerhouse

Supports models up to 120B parameters, with 200k+ Hugging Face models validated.

Multi-tenant scalability

Multiple users per appliance, enabling collaborative and concurrent AI development.

High throughput inference

LLM: Up to 300 tokens/sec (LLAMA 3.1 8B)
CV: 23,000+ inferences/sec

Agentic AI capabilities

Supports autonomous agents that can reason, plan, and act across tasks—ideal for RAG, tool use, and workflow orchestration.

MISTRAL.AI

Enabling sovereign agentic software development by deploying Devstral on premises

ADVANTECH

Advantech delivers scalable, high-performance Edge AI with Qualcomm^® Cloud AI 100 Ultra

Download PDF

AETINA

Aetina unleashes ultimate AI on-prem solution with Qualcomm Cloud AI 100 Ultra

Read press release

ARAMCO

Elevating worker safety and efficiency with advanced edge AI solutions

Read press release

Get started with these resources

Interested in reducing your inference cost and improving your performance per watt in data centers?

SOFTWARE

Evaluate the Qualcomm Cloud AI 100 Ultra

Try gen AI inferencing on Qualcomm Cloud AI Ultra in our developer playgrounds from Cirrascale.

Get started

CLOUD

Cirrascale Inference Cloud Powered by Qualcomm Technologies

Scale your offerings efficiently with inference as a service powered by Qualcomm Cloud AI.

Get started

HARDWARE

Run Inference On Premises

Certain applications need low-latency, secure, and private solutions at a low cost not found with current cloud inference providers.

Aetina Box

Helpful Links

Forums

Visit Qualcomm Support forums to ask questions, access resources, learn quick tips, and more. Expand your knowledge by interacting with others in the developer community.

Visit forums

Stay ahead of the curve

Receive the latest updates, exclusive offers, and valuable insights delivered through the Qualcomm^® newsletter straight to your inbox