Back to All
Developer Blog

GenAI for personalized education: SocratiQ with on-prem processing

Sign up for Developer monthly newsletter-image

Sign up for Developer monthly newsletter

Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.

Sign up
Come for support, stay for the community-image

Come for support, stay for the community

Get support from experts, connect with like-minded developers, and access exclusive virtual events.

Join Developer Discord

Co-written with Morteza Hosseini

Background

Earlier this year, Professor Vijay Janapa Reddy and Kai Kleinbard at Harvard University introduced SocratiQ, an AI-powered educational assistant designed to tackle the challenges of personalized learning at scale. SocratiQ helps students engage with the Machine Learning Systems textbook by offering tailored explanations, adaptive assessments, and interactive conversations.

These features are powered by Large Language Models (LLMs) served via available cloud-based inference services.

SocratiQ offers several features designed to enhance engagement and personalize learning:

  • Level-based personalization: Adapts explanations and quizzes to each student’s prior experience with the content.
  • Interactive chat: Enables conversational learning.
  • Bounded learning: Uses in-context prompting to provide responses based on textbook material.
  • Knowledge graph: Tracks student progress and concept mastery.
  • Gamification: Encourages engagement through streaks, badges, and quiz milestones.
  • Engagement heatmap: Visualizes user activity and learning patterns.

As educational content continues to shift toward digital formats and students increasingly rely on online resources, integrating AI tools directly into textbooks is becoming a practical and impactful reality. This shift presents an exciting opportunity for the Qualcomm Dragonwing AI On-Prem Appliance solution to help educational applications like SocratiQ to move beyond cloud dependency and bring near-edge inferencing into the classroom – AI tutor in the box.

Why near-edge matters for education

Transitioning to near-edge solutions offers several key benefits for students and educators:

  • Offline access: Students can continue learning without an internet connection, especially important in remote or bandwidth-constrained environments.
  • Enhanced privacy: Local inference eliminates the need to transmit student data to the cloud.
  • Low latency: Real-time responses without network delays.
  • Reliability: Less  risk of downtime due to connectivity issues or data center availability.
  • Total cost of ownership: Lower TCO compared to cloud-based services over the long term.

System architecture

To support near-edge deployment and eliminate reliance on cloud-based models, we restructured the SocratiQ architecture to run entirely on premises. This involved replacing all API calls to external cloud-hosted LLMs with local inference using the vLLM backend, which hosts opensource models directly on the Qualcomm Cloud AI 100 Ultra hardware in the Dragonwing On-Prem Appliance.

  • Web interface: JavaScript app embedded via Shadow DOM
  • Local database: Stores user preferences and quiz progress using IndexedDB
  • Local FastAPI server: Routes prompts and communicates with the vLLM backend
  • vLLM backend: Powers AI content generation using speculative decoding for high-throughput, memory-efficient inference
  • Model flexibility: Depending on the coursebook's content and language, hosted models can easily be switched to best support the educational objectives.

SocratiQ Cloud AI

Nov 25, 2025 | 2:16

Video Player is loading.
Current Time 0:00
Duration 2:15
Loaded: 4.38%
Stream Type LIVE
Remaining Time 2:15
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected
  • en (Main), selected

Dragonwing AI On-Prem appliance

At the heart of this solution is the Qualcomm AI 100 Ultra Card, which seamlessly integrates into Aetina’s MegaEdge AIP-FR68 AI workstation. This setup dramatically boosts performance for generative AI (GenAI), natural language processing, and computer vision workloads.

By hosting LLMs locally, educational institutions can reduce costs, improve performance, and optimize power consumption while maintaining full control over student data. Educators also benefit from real-time analytics with sub-millisecond latency, enabling them to monitor student progress and adapt instruction accordingly.

This hardware setup supports:

  • Up to 120B parameter models
  • More than 40 concurrent users delivering the best user experience
  • Compatibility with OpenAI APIs for online inference
  • Retrieval-augmented generation (RAG) workflows for multi-textbook contextual learning
Qualcomm Cloud AI 100 Ultra + Aetina Box Hardware Setup
Figure 1. Qualcomm Cloud AI 100 Ultra + Aetina Box Hardware Setup

What's Next?

We’re working toward creating solutions that can be generalized across a broader ecosystem of online, open-source textbooks, making it easier for educators to integrate generative AI into diverse curricula. This initiative goes beyond a single tool or classroom, as we aim to build a scalable foundation that can support a wide variety of learning environments. By designing solutions that are adaptable across multiple languages and compatible with open-source digital textbooks, we aim to make generative AI accessible to learners globally, regardless of their location or native language.

We see this as a critical stepping stone toward democratizing access to generative AI in education, especially in rural and underserved regions where internet connectivity, computing infrastructure, and access to educators may be limited. By embracing a "local-first" framework, we’re not only reducing dependency on cloud infrastructure but also enabling privacy-preserving and cost-effective AI deployments that can operate reliably in constrained environments.

Connect with fellow developers, get the latest news and prompt technical support by joining our Developer Discord

Read more:

Unlocking affordable, scalable AI: fine-tuning on AI accelerators

Creating a microservice using the Qualcomm AI Inference Suite

Using the Qualcomm AI Inference Suite directly from a web page

Learn to optimize Stable Diffusion on Qualcomm Cloud AI 100

Using the Qualcomm AI Inference Suite from Google Colab

Qualcomm AI Inference Suite: getting started is easy

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

About the Author
Amrita Moturi
Amrita MoturiIntern, Cloud AI Systems and Solutions
Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.