Developer Blog

Model HQ by LLMWare.ai: Run language models and use AI agents on Snapdragon X Series devices

Written by

Devang Aggarwal

Mar 3, 2025

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Sign up

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

Running LLMs on device has two issues: memory footprint and efficiency. To address both problems, LLMWare’s Model HQ provides a fast, efficient on-ramp to deploying and managing Gen AI models on local devices. Model HQ makes it easier to use specialized language models locally for use cases like retrieval-augmented generation (RAG) and AI agents.

Now, LLMWare’s Model HQ is available on devices with Snapdragon X Series processors. With Model HQ, AI PCs powered by Snapdragon X Series processors can deliver powerful, secure, ready-to-use AI directly on their devices, including inference on the Qualcomm Hexagon NPU.

Small language models for enterprises. Why and why now?

Did you know that up to 99% of AI workflows can be performed by small language models? SLMs are efficient AI models ranging from 1B to 32B parameters that can run even on laptops. While most of the high-profile uses of AI in the last few years have involved large-frontier LLMs with up to 1.8 trillion parameters and tremendous cloud compute, most enterprise use cases are not that demanding. Recent advancements in hardware, such as AI PCs powered by Snapdragon X Series processors, coupled with significant advancements in SLM capabilities (Phi-4 and Qwen), allow enterprises to deliver AI productivity enhancements directly at the user level, on devices that can run without Wi-Fi.

LLMWare.ai empowers enterprises with a secure, scalable, cost-effective AI solution designed for local deployments. Unlike traditional AI models that rely on costly and insecure API connections, LLMWare’s Model HQ keeps your AI where your data lives—even on laptops—ensuring maximum security and compliance.

With 30+ proprietary SLMs and 90+ optimized models (including Gemma, Llama, Phi and Mistral), LLMWare’s Model HQ eliminates costly inference fees, reducing AI costs to as low as $0 per token for laptop inferencing.

Model HQ’s no-code, point-and-click client app enables enterprises to easily use AI workflows, automate tasks and enhance productivity with out-of-the-box capabilities that include:

Chat with AI models
Text-to-SQL queries
Image tagging and classification
Document search and analysis with RAG

Model HQ allows business users to access AI for tasks like analyzing complex contracts, performing SQL queries in natural language and answering questions about earnings statements. It significantly enhances productivity, all while ensuring security and privacy with no data leaving the device.

Figure 1: Using Model HQ to query local documents

Private AI running on a PC or laptop

High-profile, first-gen applications of AI have left the impression that you need to have as much processing power as possible for AI. That’s the kind of assumption that developers and software engineers live to debunk, and LLMWare enables them to do that with Model HQ.

Naturally, if you have a device with Snapdragon X Series device with the Hexagon NPU – you can offload your AI processing work from CPU and GPU. But SLMs make it possible to run inference even on a local device that isn’t an AI PC and doesn’t have a GPU or NPU.

With Model HQ, even on a local device with only a CPU, you can use AI models as large as 32 billion parameters on Snapdragon X-powered laptops. And the GPU and NPU available on Snapdragon X processors optimize their performance.

Running AI workloads on devices with Snapdragon X Series devices is a big step toward more-personalized, secure AI use cases. You enjoy the privacy and security of running on device without shuttling data to and from the cloud. Model HQ equips you for compliance, with logs for auditing and AI explainability.

Model HQ on laptops with Snapdragon X Series series

LLMWare makes it easy for users to access more than 90 of the latest SLMs on Snapdragon X Series including:

Qwen 2.5-32B
Qwen 2.5-7B-Coder
Llama 3-8B
Phi-4
Gemma 2-27B
Gemma 2-9B
Mistral Small-22B
Mistral-7B
Yi 6B
Yi 9B
Dragon RAG-specialized models
SLIM Function-calling models for agent workflows

With the rapid advancements in language models, LLMWare will continually update their model catalog to deliver easy and private access to the best new SLMs to run devices with Snapdragon X Series, specifically on the Qualcomm Hexagon NPU.

That means you can use Model HQ on device with Snapdragon X Series for applications like chatbots, text-to-SQL reading, image reading, coding assistants, voice transcription and document analysis (PDF, DOCX, PPTX).

Model HQ uses the Qualcomm AI Stack, which includes the AI libraries, tools and SDK needed to take full advantage of the Qualcomm Hexagon NPU on the Snapdragon X Series. With the Qualcomm AI Stack, you can develop once and deploy on virtually all Qualcomm Technologies products.

With the rapid advancements in language models, LLMWare continuously updates their model catalog. They plan to continue their work with Qualcomm Technologies to expand the list of models that run on devices with Snapdragon X Series, specifically on the Qualcomm Hexagon NPU.

LLMWare x Qualcomm

Feb 26, 2025 | 2:24

Video Player is loading.

Current Time 0:00

Duration 2:23

Loaded: 4.15%

Stream Type LIVE

Remaining Time 2:23

Try it out

If you’re working on bringing AI in house – especially if you’re in a highly regulated or data-sensitive industry – then you want private, secure AI that you control. LLMWare’s expertise in deploying agentic AI applications and fine-tuning SLMs is a smart avenue to pursue, even more so now that Model HQ supports Snapdragon X Series laptops.

In addition to providing out-of-the box capabilities for on-device AI, Model HQ makes it easy for in-house developers to create lightweight AI applications using LLMWare’s GitHub repo. In the repo you’ll find a “Getting Started” section with a pip3 install llmware ready for you to try out on Snapdragon X Series devices. And have a look at LLMWare’s YouTube channel to get started with agents using function-calling models.

Planning to attend Mobile World Congress in Barcelona? Look for LLMWare at our booth 3E10 from March 3 through March 6.

You can also join us virtually on March 5 for the webinar "On-Device AI deployment with ONNX Runtime & Copilot+ PCs powered by Snapdragon" to learn more about LLMWare and watch live demo.

Learn more about the collaboration between LLMWare and Qualcomm on Snapdragon X Series.

Neural Processing CPU GPU Windows on Snapdragon

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Qualcomm-branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Author

Devang AggarwalProduct Manager, Senior