Developer Blog

On-Device Agentic AI Workflows with Qualcomm Hexagon NPU and LLMWare.ai

Written by

Namee Oberst

Written by

Darren Oberst

Written by

Meghana Rao

May 26, 2026

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

Abstract:

Most enterprise AI today is designed around cloud-hosted models and chat-based interactions, which are inherently reactive and poorly suited for real-world business processes that are repetitive, multi-step, and schedule-driven.

These workflows increasingly require execution closer to the data, where latency, privacy, and operational efficiency are critical. However, existing approaches struggle to automate end-to-end workflows on-device, limiting their ability to deliver continuous, proactive value beyond isolated prompts.

This blog introduces a new paradigm for on-device agentic AI, combining LLMWare’s Model HQ with Qualcomm Hexagon NPUs on PCs with Snapdragon X Series processors to enable automated, multi-step workflows powered by Small Language Models (SLMs) running entirely locally.

Using a no-code, drag-and-drop interface, developers can orchestrate agents that ingest enterprise data (e.g., Jira), apply reasoning, generate insights, and trigger actions on a schedule—without cloud dependency.

While products like Claude Cowork and OpenClaw provide autonomous AI assistants for individual users, Model HQ gives teams a private AI workflow building and deployment system that enables reusable automation across shared business operations. The result is fast, private, local and cost-efficient AI automation, unlocking scalable deployment of thousands of workflows that transform enterprise productivity and demonstrate measurable ROI and token cost savings through real-world use cases.

Solution Architecture:

Application Layer (User / Workflow Interface) provides

No-code integration to Microsoft Foundry Local and other model repositories
No-code agent builder and workflow UI
Custom agents, templates, batch processing, and API integrations

Example: Jira ingestion → prioritization → summarization → CSV + notifications

Model & Knowledge Layer allows

Ingesting SLMs and local LLMs optimized for Snapdragon NPU
Integrated AI knowledge management for:
- Document parsing, search, and vector databases
- Prompt management and generation pipelines
- RAG-style workflows and structured processing

Agent Orchestration Layer to create

Agent-based process orchestration engine
Multi-step workflow chaining (data → model → logic → action)
Scheduling for hands-free, time-based execution

Runtime & Inference Layer for flexibility

Windows ML + ONNX Runtime / GenAI stack
Qualcomm AI Engine with Qualcomm AI Engine Direct SDK execution providers
Heterogeneous compute orchestration across:
- NPU (primary for AI inference)
- GPU and CPU fallback paths

Hardware Layer for edge execution on

Snapdragon X Series platform with on-device NPU acceleration for:
- Low latency inference
- Performance per watt optimized for productivity and speed
- Fully offline / private execution

Build-along time:

Prerequisites:

Set up LLMWare’s Model HQ for PCs with Snapdragon: LLMWare AI for Complex Enterprises
Install Microsoft Foundry Local : Get started with Foundry Local - Foundry Local | Microsoft Learn
Configure your enterprise email account to receive notifications
Obtain your JIRA integration API token

LLMWare’s Model HQ offers an intuitive, no-code UI to build innumerable workflows that are relevant for the enterprise. Let’s explore one such workflow further.

Use case:

Automate Jira ticket overload into a daily insight pipeline using LLMWare’s Model HQ no-code agent platform running locally on Snapdragon X2 Elite
Connect to Jira to filter priority issues and generate summaries using an NPU-optimized model through a simple drag-and-drop workflow
Establish scheduled runs to produce structured reports and send notifications, delivering timely, actionable insights to stakeholders

Step 1: Integrate Foundry Local into Model HQ and download a SLM

In just a few clicks, integrate Foundry Local and download the model of your choice through Model HQ.

Step 2: Integrate your enterprise JIRA instance

Connecting to your enterprise instance of JIRA is as simple as applying your credentials, in one easy step.

Step 3: Create, edit and test custom services (to later integrate into the agent process)

Set up a custom service to connect to the JIRA instance as the knowledge base—simply define the service, link it to a project, choose the output format, and optionally update the JSON configuration.

Build new and edit or load an existing service.

Step 4: Create an agent pipeline

The Model HQ visual UI builder provides an intuitive way to create agentic pipelines—simply drag and drop nodes to connect tasks, configure inputs/outputs, select the model in Agent Global configurations, and your workflow is ready.

Test pipeline execution.

When the pipeline runs, the model is offloaded to the Snapdragon NPU for execution.

The CSV output file now contains rows processed and filtered for relevance through the agentic pipeline.

Step 5: Set up email integration and a scheduler

Through Model HQ, setup a new integration service to configure emails.

Add a node to the pipeline to route the CSV output to an email distribution list.

…and setup an automated scheduler to have emails generated at the required cadence.

Benefits:

LLMWare’s Model HQ solution enables enterprises to rapidly integrate Microsoft Foundry Local models into their workflows, providing a seamless foundation for building AI-driven automation. By leveraging a no-code agent platform, organizations can deploy fast, private AI workflows that run entirely on-device, powered by Snapdragon NPU for optimized performance and efficiency with WinML, ONNX runtime and Qualcomm AI Engine Direct SDK Execution Providers in the backend.

The architecture also supports extensibility into pro-code environments, giving developers the flexibility to customize and scale solutions as business requirements evolve.

Learn more and give it a try!

Windows on Snapdragon Developer Portal

LLMWare.ai

Hexagon DSP AI On-Device AI Windows on Snapdragon

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Disclaimer: Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Authors

Namee OberstCo-Founder, LLMWare

Darren OberstCTO, Co-Founder, LLMWare

Meghana Rao Staff Product Manager at Qualcomm