GenAI for personalized education: SocratiQ with on-prem processing
Sign up for Developer monthly newsletter
Join thousands of developers around the globe who receive latest news and updates from our monthly curated newsletter.
Sign upCome for support, stay for the community
Get support from experts, connect with like-minded developers, and access exclusive virtual events.
Join Developer DiscordCo-written with Morteza Hosseini
Background
Earlier this year, Professor Vijay Janapa Reddy and Kai Kleinbard at Harvard University introduced SocratiQ, an AI-powered educational assistant designed to tackle the challenges of personalized learning at scale. SocratiQ helps students engage with the Machine Learning Systems textbook by offering tailored explanations, adaptive assessments, and interactive conversations.
These features are powered by Large Language Models (LLMs) served via available cloud-based inference services.
SocratiQ offers several features designed to enhance engagement and personalize learning:
- Level-based personalization: Adapts explanations and quizzes to each student’s prior experience with the content.
- Interactive chat: Enables conversational learning.
- Bounded learning: Uses in-context prompting to provide responses based on textbook material.
- Knowledge graph: Tracks student progress and concept mastery.
- Gamification: Encourages engagement through streaks, badges, and quiz milestones.
- Engagement heatmap: Visualizes user activity and learning patterns.
As educational content continues to shift toward digital formats and students increasingly rely on online resources, integrating AI tools directly into textbooks is becoming a practical and impactful reality. This shift presents an exciting opportunity for the Qualcomm Dragonwing AI On-Prem Appliance solution to help educational applications like SocratiQ to move beyond cloud dependency and bring near-edge inferencing into the classroom – AI tutor in the box.
Why near-edge matters for education
Transitioning to near-edge solutions offers several key benefits for students and educators:
- Offline access: Students can continue learning without an internet connection, especially important in remote or bandwidth-constrained environments.
- Enhanced privacy: Local inference eliminates the need to transmit student data to the cloud.
- Low latency: Real-time responses without network delays.
- Reliability: Less risk of downtime due to connectivity issues or data center availability.
- Total cost of ownership: Lower TCO compared to cloud-based services over the long term.
System architecture
To support near-edge deployment and eliminate reliance on cloud-based models, we restructured the SocratiQ architecture to run entirely on premises. This involved replacing all API calls to external cloud-hosted LLMs with local inference using the vLLM backend, which hosts opensource models directly on the Qualcomm Cloud AI 100 Ultra hardware in the Dragonwing On-Prem Appliance.
- Web interface: JavaScript app embedded via Shadow DOM
- Local database: Stores user preferences and quiz progress using IndexedDB
- Local FastAPI server: Routes prompts and communicates with the vLLM backend
- vLLM backend: Powers AI content generation using speculative decoding for high-throughput, memory-efficient inference
- Model flexibility: Depending on the coursebook's content and language, hosted models can easily be switched to best support the educational objectives.
SocratiQ Cloud AI
Nov 25, 2025 | 2:16

Dragonwing AI On-Prem appliance
At the heart of this solution is the Qualcomm AI 100 Ultra Card, which seamlessly integrates into Aetina’s MegaEdge AIP-FR68 AI workstation. This setup dramatically boosts performance for generative AI (GenAI), natural language processing, and computer vision workloads.
By hosting LLMs locally, educational institutions can reduce costs, improve performance, and optimize power consumption while maintaining full control over student data. Educators also benefit from real-time analytics with sub-millisecond latency, enabling them to monitor student progress and adapt instruction accordingly.
This hardware setup supports:
- Up to 120B parameter models
- More than 40 concurrent users delivering the best user experience
- Compatibility with OpenAI APIs for online inference
- Retrieval-augmented generation (RAG) workflows for multi-textbook contextual learning
What's Next?
We’re working toward creating solutions that can be generalized across a broader ecosystem of online, open-source textbooks, making it easier for educators to integrate generative AI into diverse curricula. This initiative goes beyond a single tool or classroom, as we aim to build a scalable foundation that can support a wide variety of learning environments. By designing solutions that are adaptable across multiple languages and compatible with open-source digital textbooks, we aim to make generative AI accessible to learners globally, regardless of their location or native language.
We see this as a critical stepping stone toward democratizing access to generative AI in education, especially in rural and underserved regions where internet connectivity, computing infrastructure, and access to educators may be limited. By embracing a "local-first" framework, we’re not only reducing dependency on cloud infrastructure but also enabling privacy-preserving and cost-effective AI deployments that can operate reliably in constrained environments.
Connect with fellow developers, get the latest news and prompt technical support by joining our Developer Discord
Read more:
Unlocking affordable, scalable AI: fine-tuning on AI accelerators
Creating a microservice using the Qualcomm AI Inference Suite
Using the Qualcomm AI Inference Suite directly from a web page
Learn to optimize Stable Diffusion on Qualcomm Cloud AI 100

