Developer Blog

GenAI for personalized education: SocratiQ with on-prem processing

Written by

Amrita Moturi

Nov 26, 2025

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Sign up

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

Co-written with Morteza Hosseini

Background

Earlier this year, Professor Vijay Janapa Reddy and Kai Kleinbard at Harvard University introduced SocratiQ, an AI-powered educational assistant designed to tackle the challenges of personalized learning at scale. SocratiQ helps students engage with the Machine Learning Systems textbook by offering tailored explanations, adaptive assessments, and interactive conversations.

These features are powered by Large Language Models (LLMs) served via available cloud-based inference services.

SocratiQ offers several features designed to enhance engagement and personalize learning:

Level-based personalization: Adapts explanations and quizzes to each student’s prior experience with the content.
Interactive chat: Enables conversational learning.
Bounded learning: Uses in-context prompting to provide responses based on textbook material.
Knowledge graph: Tracks student progress and concept mastery.
Gamification: Encourages engagement through streaks, badges, and quiz milestones.
Engagement heatmap: Visualizes user activity and learning patterns.

As educational content continues to shift toward digital formats and students increasingly rely on online resources, integrating AI tools directly into textbooks is becoming a practical and impactful reality. This shift presents an exciting opportunity for the Qualcomm Dragonwing AI On-Prem Appliance solution to help educational applications like SocratiQ to move beyond cloud dependency and bring near-edge inferencing into the classroom – AI tutor in the box.

Why near-edge matters for education

Transitioning to near-edge solutions offers several key benefits for students and educators:

Offline access: Students can continue learning without an internet connection, especially important in remote or bandwidth-constrained environments.
Enhanced privacy: Local inference eliminates the need to transmit student data to the cloud.
Low latency: Real-time responses without network delays.
Reliability: Less risk of downtime due to connectivity issues or data center availability.
Total cost of ownership: Lower TCO compared to cloud-based services over the long term.

System architecture

To support near-edge deployment and eliminate reliance on cloud-based models, we restructured the SocratiQ architecture to run entirely on premises. This involved replacing all API calls to external cloud-hosted LLMs with local inference using the vLLM backend, which hosts opensource models directly on the Qualcomm Cloud AI 100 Ultra hardware in the Dragonwing On-Prem Appliance.

Web interface: JavaScript app embedded via Shadow DOM
Local database: Stores user preferences and quiz progress using IndexedDB
Local FastAPI server: Routes prompts and communicates with the vLLM backend
vLLM backend: Powers AI content generation using speculative decoding for high-throughput, memory-efficient inference
Model flexibility: Depending on the coursebook's content and language, hosted models can easily be switched to best support the educational objectives.

SocratiQ Cloud AI

Nov 25, 2025 | 2:16

Video Player is loading.

Current Time 0:00

Duration 2:15

Loaded: 4.38%

Stream Type LIVE

Remaining Time 2:15

Dragonwing AI On-Prem appliance

At the heart of this solution is the Qualcomm AI 100 Ultra Card, which seamlessly integrates into Aetina’s MegaEdge AIP-FR68 AI workstation. This setup dramatically boosts performance for generative AI (GenAI), natural language processing, and computer vision workloads.

By hosting LLMs locally, educational institutions can reduce costs, improve performance, and optimize power consumption while maintaining full control over student data. Educators also benefit from real-time analytics with sub-millisecond latency, enabling them to monitor student progress and adapt instruction accordingly.

This hardware setup supports:

Up to 120B parameter models
More than 40 concurrent users delivering the best user experience
Compatibility with OpenAI APIs for online inference
Retrieval-augmented generation (RAG) workflows for multi-textbook contextual learning

Figure 1. Qualcomm Cloud AI 100 Ultra + Aetina Box Hardware Setup

What's Next?

We’re working toward creating solutions that can be generalized across a broader ecosystem of online, open-source textbooks, making it easier for educators to integrate generative AI into diverse curricula. This initiative goes beyond a single tool or classroom, as we aim to build a scalable foundation that can support a wide variety of learning environments. By designing solutions that are adaptable across multiple languages and compatible with open-source digital textbooks, we aim to make generative AI accessible to learners globally, regardless of their location or native language.

We see this as a critical stepping stone toward democratizing access to generative AI in education, especially in rural and underserved regions where internet connectivity, computing infrastructure, and access to educators may be limited. By embracing a "local-first" framework, we’re not only reducing dependency on cloud infrastructure but also enabling privacy-preserving and cost-effective AI deployments that can operate reliably in constrained environments.

Connect with fellow developers, get the latest news and prompt technical support by joining our Developer Discord

Creating a microservice using the Qualcomm AI Inference Suite

Using the Qualcomm AI Inference Suite directly from a web page

Learn to optimize Stable Diffusion on Qualcomm Cloud AI 100

Using the Qualcomm AI Inference Suite from Google Colab

Qualcomm AI Inference Suite: getting started is easy

Cloud AI

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Author

Amrita MoturiIntern, Cloud AI Systems and Solutions