Back to All
Developer Blog

Big Performance Boost for llama.cpp and chatglm.cpp with Windows on Snapdragon

performance boost with llama.cpp and chatglm.cpp
performance boost with llama.cpp and chatglm.cpp

If you’ve been developing AI apps for Windows on Snapdragon, you may have found sub-par performance with llama.cpp and chatglm.cpp. In fact, using build commands from open-source webpages, our own Qualcomm Technologies’ engineers and partner OEMs have seen as few as 3 tokens per second processed on Windows on Snapdragon devices.

But now you can accelerate that processing by enabling two features – NEON and FMA_ARM – and using either of the two new build commands for LLVM-MinGW and MSVC. In fact, Qualcomm Technologies’ engineers have seen dramatic improvements in the resulting performance, with up to 20 tokens per second processed on a device with Snapdragon X Elite Compute Platform.

In this post, you’ll see how to build llama.cpp and chatglm.cpp with the LLVM-MinGW and MSVC commands to improve performance.

Download code for llama and chatglm

llama.cpp

llama.cpp is designed to run Meta's GPT-3-class large language model (LLM) – known as LLaMA – on local devices, including those powered by Windows on Snapdragon. It enables high performance on a wide variety of hardware with minimal setup.

chatglm.cpp

ChatGLM is an open, bilingual language model optimized for Chinese conversation, based on the General Language Model architecture. chatglm.cpp enables real-time inference on laptops such as those powered by Windows on Snapdragon. It is accelerated by quantization, in a way that is similar to llama.cpp.

Once you have the code for the apps, you’ll follow different procedures depending on whether you build with LLVM-MinGW or Microsoft Visual C/C++ (MSVC).

Build llama.cpp and chatglm.cpp (using LLVM-MinGW)

MinGW is a native port of open-source GCC that allows C programming on Windows. You can use its header files and import libraries to build native Windows applications. Here is how to use LLVM-MinGW to enable the NEON and FMA_ARM features:

1.       Download and install the tools.

2.       Note that there is no need to apply any patches for llama.cpp and chatglm.cpp.

3.       From the Windows command prompt, go to the code path named “llama.cpp” or “chatglm.cpp” and execute the following commands to build the binary files:

mkdir build
cd build
cmake .. -G "MinGW Makefiles"
cmake --build . --config Release

Build llama.cpp and chatglm.cpp (using MSVC)

You can also use MSVC to build high-performance binary files and enable the NEON and FMA_ARM features.

1.       Download and install the tools.

2.       For llama.cpp, there is no need to apply a patch, but for chatglm.cpp, you must apply the following patch:

C:\chatglm.cpp\third_party\ggml>git diff
diff --git a/include/ggml/ggml.h b/include/ggml/ggml.h
index 4b16032..eaa6c79 100644
--- a/include/ggml/ggml.h
+++ b/include/ggml/ggml.h
@@ -286,7 +286,7 @@ extern "C" {

 #if defined(__ARM_NEON) && defined(__CUDACC__)
     typedef half ggml_fp16_t;
-#elif defined(__ARM_NEON)
+#elif defined(__ARM_NEON) && !defined(_MSC_VER)
     typedef __fp16 ggml_fp16_t;
 #else
     typedef uint16_t ggml_fp16_t;
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index b225597..d1a1d66 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -41,8 +41,14 @@ endif()

 if (${CMAKE_SYSTEM_NAME} STREQUAL "Emscripten")
     message(STATUS "Emscripten detected")
-elseif (${CMAKE_SYSTEM_PROCESSOR} MATCHES "arm" OR ${CMAKE_SYSTEM_PROCESSOR} MATCHES "aarch64")
+elseif ((${CMAKE_SYSTEM_PROCESSOR} MATCHES "arm") OR (${CMAKE_SYSTEM_PROCESSOR} MATCHES "aarch64") OR (${CMAKE_SYSTEM_PROCESSOR} MATCHES "ARM64"))
     message(STATUS "ARM detected")
+       if (MSVC)
+           add_compile_definitions(__ARM_NEON)
+           add_compile_definitions(__ARM_FEATURE_FMA)
+           add_compile_definitions(__ARM_FEATURE_DOTPROD)
+           add_compile_definitions(__aarch64__) # MSVC defines _M_ARM64 instead
+       endif()
     #set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mcpu=apple-m1")
 elseif (${CMAKE_SYSTEM_PROCESSOR} MATCHES "ppc64le" OR ${CMAKE_SYSTEM_PROCESSOR} MATCHES "ppc64")
     message(STATUS "PPC64 detected"

3.       From the Windows command prompt, go to the code path  named “llama.cpp” or “chatglm.cpp” and execute the following commands to build the binary files:

mkdir build
cd build
cmake .. -A ARM64
cmake --build . --config Release

Now it’s your turn to build

With those easily accessible tools and easy-to-follow instructions, you should see a substantial boost in performance.

Explore our new Windows on Snapdragon page - we’ve made it easier to get your hands on the resources to develop for Windows on Snapdragon and to port existing apps. That includes developer tools, full documentation, support options and our developer blog.

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. 

About the Author
Changgeng Zhong
Changgeng Zhong Staff Engineer
Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.