Back to All
Developer Blog

On-Device Agentic AI Workflows with Qualcomm Hexagon NPU and LLMWare.ai

Sign up for Developer monthly newsletter-image

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Sign up
Come for support, stay for the community-image

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

Abstract:

Most enterprise AI today is designed around cloud-hosted models and chat-based interactions, which are inherently reactive and poorly suited for real-world business processes that are repetitive, multi-step, and schedule-driven.

These workflows increasingly require execution closer to the data, where latency, privacy, and operational efficiency are critical. However, existing approaches struggle to automate end-to-end workflows on-device, limiting their ability to deliver continuous, proactive value beyond isolated prompts.

This blog introduces a new paradigm for on-device agentic AI, combining LLMWare’s Model HQ with Qualcomm Hexagon NPUs on PCs with Snapdragon X Series processors to enable automated, multi-step workflows powered by Small Language Models (SLMs) running entirely locally.

Using a no-code, drag-and-drop interface, developers can orchestrate agents that ingest enterprise data (e.g., Jira), apply reasoning, generate insights, and trigger actions on a schedule—without cloud dependency.

While products like Claude Cowork and OpenClaw provide autonomous AI assistants for individual users, Model HQ gives teams a private AI workflow building and deployment system that enables reusable automation across shared business operations. The result is fast, private, local and cost-efficient AI automation, unlocking scalable deployment of thousands of workflows that transform enterprise productivity and demonstrate measurable ROI and token cost savings through real-world use cases.

Solution Architecture:

Qualcomm-image
Application Layer (User / Workflow Interface) provides
  • No-code integration to Microsoft Foundry Local and other model repositories
  • No-code agent builder and workflow UI
  • Custom agents, templates, batch processing, and API integrations

Example: Jira ingestion → prioritization → summarization → CSV + notifications

Model & Knowledge Layer allows

  • Ingesting SLMs and local LLMs optimized for Snapdragon NPU

  • Integrated AI knowledge management for:
    • Document parsing, search, and vector databases
    • Prompt management and generation pipelines
    • RAG-style workflows and structured processing

Agent Orchestration Layer to create

  • Agent-based process orchestration engine
  • Multi-step workflow chaining (data → model → logic → action)
  • Scheduling for hands-free, time-based execution

Runtime & Inference Layer for flexibility

  • Windows ML + ONNX Runtime / GenAI stack

  • Qualcomm AI Engine with Qualcomm AI Engine Direct SDK execution providers

  • Heterogeneous compute orchestration across:
    • NPU (primary for AI inference)
    • GPU and CPU fallback paths

Hardware Layer for edge execution on

  • Snapdragon X Series platform with on-device NPU acceleration for:
    • Low latency inference
    • Performance per watt optimized for productivity and speed
    • Fully offline / private execution

Build-along time:

Prerequisites:

  1. Set up LLMWare’s Model HQ for PCs with Snapdragon: LLMWare AI for Complex Enterprises

  2. Install Microsoft Foundry Local : Get started with Foundry Local - Foundry Local | Microsoft Learn

  3. Configure your enterprise email account to receive notifications

  4. Obtain your JIRA integration API token

LLMWare’s Model HQ offers an intuitive, no-code UI to build innumerable workflows that are relevant for the enterprise. Let’s explore one such workflow further.

Use case:

  • Automate Jira ticket overload into a daily insight pipeline using LLMWare’s Model HQ no-code agent platform running locally on Snapdragon X2 Elite

  • Connect to Jira to filter priority issues and generate summaries using an NPU-optimized model through a simple drag-and-drop workflow

  • Establish scheduled runs to produce structured reports and send notifications, delivering timely, actionable insights to stakeholders

Step 1: Integrate Foundry Local into Model HQ and download a SLM

In just a few clicks, integrate Foundry Local and download the model of your choice through Model HQ. 

Qualcomm-image
Qualcomm-image
Qualcomm-image
Qualcomm-image

Step 2: Integrate your enterprise JIRA instance

Connecting to your enterprise instance of JIRA is as simple as applying your credentials, in one easy step.

Qualcomm-image
Qualcomm-image

Step 3: Create, edit and test custom services (to later integrate into the agent process)

Set up a custom service to connect to the JIRA instance as the knowledge base—simply define the service, link it to a project, choose the output format, and optionally update the JSON configuration.

Qualcomm-image

Build new and edit or load an existing service.

Qualcomm-image
Qualcomm-image
Qualcomm-image
Qualcomm-image

Step 4: Create an agent pipeline

The Model HQ visual UI builder provides an intuitive way to create agentic pipelines—simply drag and drop nodes to connect tasks, configure inputs/outputs, select the model in Agent Global configurations, and your workflow is ready.

Qualcomm-image
Qualcomm-image

Test pipeline execution.

Qualcomm-image

When the pipeline runs, the model is offloaded to the Snapdragon NPU for execution.

Qualcomm-image

The CSV output file now contains rows processed and filtered for relevance through the agentic pipeline.

Qualcomm-image

Step 5: Set up email integration and a scheduler

Through Model HQ, setup a new integration service to configure emails.

Qualcomm-image
Qualcomm-image

Add a node to the pipeline to route the CSV output to an email distribution list.

Qualcomm-image

…and setup an automated scheduler to have emails generated at the required cadence.

Qualcomm-image

Benefits:

LLMWare’s Model HQ solution enables enterprises to rapidly integrate Microsoft Foundry Local models into their workflows, providing a seamless foundation for building AI-driven automation. By leveraging a no-code agent platform, organizations can deploy fast, private AI workflows that run entirely on-device, powered by Snapdragon NPU for optimized performance and efficiency with WinML, ONNX runtime and Qualcomm AI Engine Direct SDK  Execution Providers in the backend.

The architecture also supports extensibility into pro-code environments, giving developers the flexibility to customize and scale solutions as business requirements evolve.

Learn more and give it a try!

Windows on Snapdragon Developer Portal

LLMWare.ai

What XNNPACK does (and doesn’t) do

When people first hear about XNNPACK, it’s easy to confuse it with a full graph compiler. It isn’t. XNNPACK lives below the graph level. XNNPACK is not a graph compiler! It does not:

  • Perform cross-op fusion (e.g., Conv+BN+ReLU, long elementwise chains).

  • Do graph rewrites like constant folding or inserting quantize/dequantize nodes.

  • Decide op ordering, device placement, or global scheduling.

  • Choose quantization strategy or layouts on its own.

All of that is the job of the ML framework (TFLite, PyTorch, ONNX Runtime, etc.). The framework partitions the graph and then hands chunks of it to XNNPACK as “subgraphs” to execute.

How XNNPACK works

XNNPACK focuses on turning those subgraphs into fast kernel calls by providing a subgraph API to build a DAG of operators and lowering each subgraph into calls to optimized microkernels for the chosen architecture.

It also provides packing and layout transformations for weights and activations.

Along the way, it performs operator-specific optimizations, such as weight packing (reordering and interleaving for GEMM/conv), selecting the best kernel variant (tile sizes, MR/NR, data type), using fused operators when available (e.g., SoftmaxFused instead of a chain of ReduceMax → Sub → Exp → ReduceSum → Div).

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Disclaimer: Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Authors
Meghana Rao
Meghana Rao Staff Product Manager at Qualcomm
Namee Oberst
Namee OberstCo-Founder, LLMWare
Darren Oberst
Darren OberstCTO, Co-Founder, LLMWare
Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.