Back to All
Project

Build a Simple NPU Chatbot

Simple NPU Chatbot is a lightweight, customizable chatbot designed to run locally on Snapdragon™ devices using the Qualcomm® Neural Processing Unit (NPU). It uses the Anything LLM model server for features like Retrieval-Augmented Generation (RAG), long-term memory, and workspace management, while keeping all data processing on the device. The chatbot has been successfully tested on both the Qualcomm® Device Cloud and real Snapdragon X-Elite hardware, utilizing the Qualcomm QNN runtime for fast, efficient LLM inference with models such as Llama 3.1 8B Chat 8K.

These instructions provide a simple base for building NPU-accelerated AI chat apps, supporting both terminal and Gradio web interfaces.

 

Materials Required / Parts List / Tools

 

Build / Assembly Instructions

1. Setup (10 minutes)

  • Hardware
    • An Ubuntu/Windows Host system – on Qualcomm Device Cloud.

OR

  • Windows laptop with below specifications – on Device.
    • Machine: Any Windows on Snapdragon Support device
    • Chip: Snapdragon X-Elite
    • OS: Windows 11
    • Memory: 8GB or higher


2. Deployment options

Users have two options for implementing this project:

  • Using Qualcomm Developer Cloud:
    • Users opting for this must First complete the setup for Snapdragon X-Elite (section 4). After setting up, proceed to Build the Application (section 5).
  • Using Real Snapdragon X-Elite Device:
    • Users with a physical Snapdragon X-Elite powered windows device can directly begin from the Build the Application (section 5).

 

Project Walkthrough

Qualcomm Device Cloud setup for Snapdragon X-Elite

  1. Visit https://qdc.qualcomm.com/ and sign up for an account to create a new session.
  2. In the home page click on New Interactive Session. We recommend using Remote Desktop instead of web browser to interact with Compute devices.
Qualcomm-image

3. In Select a Platform section, select Compute option and click Next

Qualcomm-image

4. Select Device Section, select Snapdragon X Elite and fill the mandatory fields to Unlock your free 1000 minutes and click on Next.

After utilizing the initial 1000 free minutes, developers have the option to request additional free minutes.

Qualcomm-image
Qualcomm-image

5. In Device Configuration check the toggle button to install SNPE and configure another Qualcomm®

build and click on Next.

Qualcomm-image

6. In Test Configuration fill in the mandatory fields in Session details like Session name according to your use case. Can choose Maximum minutes per device is 720 minutes (12 hours). Choose the mode of operation out of three modes and follow the appropriate guide provided for each mode. For this use case we have selected screen mirroring only.

Qualcomm-image

7. This is optional and not a necessity for running this project. There is a provision of Package Provider to upload assets in zip format.Zip will be extracted and pushed to C:\Temp\file\ before the session is started. Once you are done, it will enable Create Session tab, click on it and continue.

Qualcomm-image

8. The process will take some time to open the window, and you are ready to access cloud device Snapdragon X-Elite.

Qualcomm-image

Install the Application (20 minutes)

1. Visit https://docs.anythingllm.com/installation-desktop/windows and download the installation file of AnythingLLM Windows-ARM 64 bit. Follow the Guide and install AnythingLLM app.

2. Click on the application "AnythingLLM" on your desktop to boot up AnythingLLM!.

Qualcomm-image

a. Choose Qualcomm QNN when prompted to choose an LLM provider to target the NPU. (Default LLM Provider)

b. Choose a model of your choice when prompted. (This model is not downloaded. It will be downloaded when you save your settings)

Qualcomm-image

c. Create your first workspace by providing the workspace name.

Qualcomm-image

d. Generate an API key.

  1. Click the settings button on the bottom of the left panel 
  2. Open the "Tools" dropdown
  3. Click "Developer API"
  4. Click "Generate New API Key"
Qualcomm-image
Qualcomm-image

e. Check the model download status in settings.

  • Check the model in Settings -> AI Providers -> LLM -> chat model selection in AnythingLLM. If you see "model requires download" choose the model, click save, switch back, then save. You should see the model download in the upper right corner of the AnythingLLM window. Once the model is downloaded you should see "uninstall" on the model card if it is installed correctly. (Chat Model: Llama 3.1 8B Chat 8K is used in this example)
Qualcomm-image
Qualcomm-image
Qualcomm-image
Qualcomm-image

f. Open a PowerShell instance and clone the repository.

$ git clone source repository
$ cd source repository
$ python -m venv llm-venv
$ Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process
$ ./llm-venv/Scripts/Activate.ps1    
$ pip install -r requirements.txt

g. Create your config.yaml file with the following variables and save the file.

$ notepad config.yaml

api_key : your-key-here"

model_server_base_url: "http://localhost:3001/api/v1"

workspace_slug: "your-slug-here"

stream: true

stream_timeout: 60

Qualcomm-image

h. Get your workspace slug using the workspaces tool.

$ python src/workspaces.py

Find your workspace and its slug from the output. Add the slug to the workspace_slug variable in config.yaml

Qualcomm-image
Qualcomm-image

i. Test the model server auth to verify the API key.

$ python src/auth.py
Qualcomm-image

 

Usage Instructions

Users have two options to use either a terminal or gradio chat interface. After completing setup, run the below commands to talk with the bot.

# terminal

$ python src/terminal_chatbot.py
Qualcomm-image
Qualcomm-image

# gradio

python src/gradio_chatbot.py
Qualcomm-image
Qualcomm-image
Qualcomm-image
Qualcomm-image

Open the Task Manager while the response is running in the chatbot interface to view the NPU utilization, as shown in the screenshot below.

Qualcomm-image

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Project Authors
Nick Debeurre
Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.