Simple NPU Chatbot is a lightweight, customizable chatbot designed to run locally on Snapdragon™ devices using the Qualcomm® Neural Processing Unit (NPU). It uses the Anything LLM model server for features like Retrieval-Augmented Generation (RAG), long-term memory, and workspace management, while keeping all data processing on the device. The chatbot has been successfully tested on both the Qualcomm® Device Cloud and real Snapdragon X-Elite hardware, utilizing the Qualcomm QNN runtime for fast, efficient LLM inference with models such as Llama 3.1 8B Chat 8K.
These instructions provide a simple base for building NPU-accelerated AI chat apps, supporting both terminal and Gradio web interfaces.
Materials Required / Parts List / Tools
|
GitHub links |
|
|
Qualcomm Device Cloud |
Build / Assembly Instructions
1. Setup (10 minutes)
- Hardware
- An Ubuntu/Windows Host system – on Qualcomm Device Cloud.
OR
- Windows laptop with below specifications – on Device.
- Machine: Any Windows on Snapdragon Support device
- Chip: Snapdragon X-Elite
- OS: Windows 11
- Memory: 8GB or higher
- Software and library installation
- AnythingLLM Application for windows - https://docs.anythingllm.com/installation-desktop/windows
- Python Version: 3.12.6 - https://www.python.org/downloads/release/python-3126/
- Git: 2.49 - https://git-scm.com/downloads
- Microsoft C++ Build tools - https://visualstudio.microsoft.com/
- Install Rust on Windows using rustup - https://sh.rustup.rs/
- AnythingLLM LLM Provider: Qualcomm QNN
- AnythingLLM Chat Model: Llama 3.1 8B Chat 8K
- Development environment setup
- Qualcomm Device Cloud account https://qdc.qualcomm.com/
2. Deployment options
Users have two options for implementing this project:
- Using Qualcomm Developer Cloud:
- Users opting for this must First complete the setup for Snapdragon X-Elite (section 4). After setting up, proceed to Build the Application (section 5).
- Using Real Snapdragon X-Elite Device:
- Users with a physical Snapdragon X-Elite powered windows device can directly begin from the Build the Application (section 5).
Project Walkthrough
Qualcomm Device Cloud setup for Snapdragon X-Elite
- Visit https://qdc.qualcomm.com/ and sign up for an account to create a new session.
- In the home page click on New Interactive Session. We recommend using Remote Desktop instead of web browser to interact with Compute devices.
3. In Select a Platform section, select Compute option and click Next
4. Select Device Section, select Snapdragon X Elite and fill the mandatory fields to Unlock your free 1000 minutes and click on Next.
After utilizing the initial 1000 free minutes, developers have the option to request additional free minutes.
5. In Device Configuration check the toggle button to install SNPE and configure another Qualcomm®
build and click on Next.
6. In Test Configuration fill in the mandatory fields in Session details like Session name according to your use case. Can choose Maximum minutes per device is 720 minutes (12 hours). Choose the mode of operation out of three modes and follow the appropriate guide provided for each mode. For this use case we have selected screen mirroring only.
7. This is optional and not a necessity for running this project. There is a provision of Package Provider to upload assets in zip format.Zip will be extracted and pushed to C:\Temp\file\ before the session is started. Once you are done, it will enable Create Session tab, click on it and continue.
8. The process will take some time to open the window, and you are ready to access cloud device Snapdragon X-Elite.
Install the Application (20 minutes)
1. Visit https://docs.anythingllm.com/installation-desktop/windows and download the installation file of AnythingLLM Windows-ARM 64 bit. Follow the Guide and install AnythingLLM app.
2. Click on the application "AnythingLLM" on your desktop to boot up AnythingLLM!.
a. Choose Qualcomm QNN when prompted to choose an LLM provider to target the NPU. (Default LLM Provider)
b. Choose a model of your choice when prompted. (This model is not downloaded. It will be downloaded when you save your settings)
c. Create your first workspace by providing the workspace name.
d. Generate an API key.
- Click the settings button on the bottom of the left panel
- Open the "Tools" dropdown
- Click "Developer API"
- Click "Generate New API Key"
e. Check the model download status in settings.
- Check the model in Settings -> AI Providers -> LLM -> chat model selection in AnythingLLM. If you see "model requires download" choose the model, click save, switch back, then save. You should see the model download in the upper right corner of the AnythingLLM window. Once the model is downloaded you should see "uninstall" on the model card if it is installed correctly. (Chat Model: Llama 3.1 8B Chat 8K is used in this example)
f. Open a PowerShell instance and clone the repository.
$ git clone source repository
$ cd source repository
$ python -m venv llm-venv
$ Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process
$ ./llm-venv/Scripts/Activate.ps1
$ pip install -r requirements.txtg. Create your config.yaml file with the following variables and save the file.
$ notepad config.yamlapi_key : your-key-here"
model_server_base_url: "http://localhost:3001/api/v1"
workspace_slug: "your-slug-here"
stream: true
stream_timeout: 60
h. Get your workspace slug using the workspaces tool.
$ python src/workspaces.pyFind your workspace and its slug from the output. Add the slug to the workspace_slug variable in config.yaml
i. Test the model server auth to verify the API key.
$ python src/auth.py
Usage Instructions
Users have two options to use either a terminal or gradio chat interface. After completing setup, run the below commands to talk with the bot.
# terminal
$ python src/terminal_chatbot.py# gradio
python src/gradio_chatbot.pyOpen the Task Manager while the response is running in the chatbot interface to view the NPU utilization, as shown in the screenshot below.
Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.
