Developer Blog

How to run DeepSeek models on Windows on Snapdragon – Ollama tutorial

Written by

Devang Aggarwal

Written by

Dileep Karpur

Feb 4, 2025

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Sign up

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

Co-written with Li He, Srinivasa Deevi, Hongqiang Wang, Sai Gayathri Gampa.

DeepSeek-R1 is an open-source reasoning model developed by DeepSeek to handle tasks requiring logical inference, mathematical problem-solving and real-time decision making. One of its standout features is the ability to trace its logic, which makes it easier to understand and, if necessary, challenge its output. This transparency is particularly valuable in fields where explainable outcomes are crucial, such as research and complex decision making.

AI distillation is a process that creates smaller, more efficient models from larger ones, retaining much of their reasoning power while reducing computational demands. DeepSeek has applied this technique to develop a suite of distilled models from R1, using Qwen and Llama architectures. That allows users to take advantage of the capabilities of DeepSeek-R1 on standard laptops.

Developers have a few options to run their AI models on Windows on Snapdragon. One of the most popular options is to leverage LLM platforms like Ollama. Ollama is highly favored among developers due to its optimized performance and efficiency.

Built on the llama.cpp framework, Ollama introduces additional optimizations that enhance inference speed and reduce memory usage, making it accessible even on consumer-grade hardware.

This makes it an ideal choice for developers looking to integrate large language models into their applications without the need for expensive or specialized hardware.

Furthermore, Ollama's user-friendly features, such as automatic model templating and on-demand loading, streamline the development process, allowing developers to focus more on innovation and less on technical hurdles.

This tutorial shows you how to run DeepSeek-R1 models on Windows on Snapdragon CPU using Ollama. You can run the steps below on Snapdragon X Series laptops.

Running on CPU – Ollama how to guide

Visit the Ollama website. Open your browser and go to https://ollama.com.
Download and install. Click the Download button, select the Windows installer and follow the prompts to complete the installation.
Run your model. Open a command prompt or terminal and enter the following command:

ollama run <model_name>

Select any deepseek model:

Figure 1: DeepSeek models in Ollama.

4. Start using the application. Following are sample results for three DeepSeek models:

ollama run deepseek-r1:1.5b

The results are fast and good with the 1.5b model, as shown below:

Results(1) from ollama run deepseek-r1:1.5b

Figure 2: Results from ollama run deepseek-r1:1.5b (Part 1)

Results(2) from ollama run deepseek-r1:1.5b

Figure 3: Results from ollama run deepseek-r1:1.5b (Part 2)

ollama run deepseek-r1:7b

These results are similarly good and accurate with elaborate explanation. However, inference is slower than with 1.5b:

Figure 4: Results from ollama run deepseek-r1:7b

ollama run deepseek-r1:8b

The result of running this model was an error:

Results (2) from ollama run deepseek-r1:8b (error)

Figure 5: Results from ollama run deepseek-r1:8b (Error)

Next steps

We’ll have more details shortly about running on NPU.

Meanwhile, Microsoft is bringing NPU-optimized versions of DeepSeek-R1 directly to Copilot+ PCs, starting with Qualcomm Snapdragon X Series devices. The company also announced that the distilled DeepSeek R1 models, optimized using ONNX, are now available on Snapdragon-powered Copilot+ PCs. These models offer a time to first token of less than 70 ms for short prompts (<64 tokens) and a throughput rate of 25-40 tokens/s, with longer responses achieving higher throughput. Get started today by downloading the AI Toolkit extension in VS Code.

Want to find out more about DeepSeek on Windows on Snapdragon? Join our Developer Discord for more insights and real-time conversations with fellow developers and our technical experts.

AI Compute Machine Learning Windows on Snapdragon

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Qualcomm-branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Authors

Devang AggarwalProduct Manager, Senior