Back to All
OnQ Blog

LLMs, MoE and NLP take center stage: Key insights from Qualcomm's AI Summit 2023 on the future of AI

Experts at Microsoft, Duke and Stanford weigh in on the advancements and challenges of AI

Qualcomm’s annual internal artificial intelligence (AI) Summit brought together industry experts and Qualcomm employees from over the world to San Diego in December 2023 to discuss the future of AI and its potential impact on various industries. We heard varying perspectives from experts at Microsoft, Duke and Stanford Universities, who also gave us insights into the work being done on large language models (LLMs), mixture-of-experts (MoE) models, natural language processing (NLP) and more. Read on to see the key takeaways from each talk.

Marc Tremblay

Mar 19, 2024 | 45:16

Video Player is loading.
Current Time 0:00
Duration 45:16
Loaded: 0.22%
Stream Type LIVE
Remaining Time 45:16
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected
  • en (Main), selected

How can we deploy LLMs successfully?

Key takeaways:

  1. Model size matters, and larger models can produce better results.
  2. The amount of compute needed to train top models is increasing, doubling every four to five months.
  3. Solutions are needed to better deal with the memory constraints of devices.

Marc Tremblay, vice president and distinguished engineer of silicon technology and strategy at Microsoft, discusses the challenges and advancements in building LLMs. He explains the general architecture of LLMs and the importance of factors like access to the web, responsible AI and compliance checks in deploying them as commercial entities. He also discusses the exponential growth in compute power required for training these models and the need for significant capital expenditures to support this growth. Tremblay says,

This is the amount of compute needed to train the model of the day. It's kind of growing at Moore's law’s rate of two times every two years roughly, then it hits a deep learning era. And now the compute that we need doubles every four to five months.

 

However, he remains optimistic about the potential for continued advancements in LLMs and the exciting possibilities they offer. He also delved into the specifics of generative AI inference, focusing on the challenges and optimizations involved in the token generation phase. Tremblay emphasizes the importance of identifying bottlenecks, exploiting parallelism, and leveraging hardware assets to improve performance and efficiency.

As the field of AI evolves, it is transforming numerous industrial sectors. We are observing a continuous increase in both the scale and complexity of cutting-edge models. This evolution poses substantial challenges in deploying these advanced models on edge devices, which often have limited computing and storage resources. 

Yiran Chen

Mar 19, 2024 | 34:15

Video Player is loading.
Current Time 0:00
Duration 34:15
Loaded: 0.29%
Stream Type LIVE
Remaining Time 34:15
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected
  • en (Main), selected

Reconciling large models and memory-constrained devices

Key takeaways:

  1. The gap between model size and device capabilities is a significant challenge in AI.
  2. Leveraging smaller models for training can improve efficiency.
  3. Collaborative AI can help overcome the limitations of on-device AI.

Yiran Chen, electrical and computer engineering professor at Duke University, gives a presentation introducing recent work called Sparsity-inspired Data-Aware (SiDA), which facilitates the deployment of large-scale Mixture of Expert (MoE) LLMs on memory-constrained devices. He also speaks to the collaborative work with Qualcomm Technologies on AutoML, which enables the deployment of advanced recommendation systems on Qualcomm Technologies’ SoCs (System on Chip). Chen says,

Collaborative AI is attractive, because if we cannot handle these things in the local, we need to handle them in a collaborative way. The fact that the computational resources and model are offered by different parties is another reason to consider this.

 

Chen presents several of his team’s federated learning projects that promote private, more accurate distributed intelligence by addressing the resource limitations in on-device applications of foundation models. 

Christopher Potts

Mar 18, 2024 | 37:21

Video Player is loading.
Current Time 0:00
Duration 37:20
Loaded: 0.27%
Stream Type LIVE
Remaining Time 37:20
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected
  • en (Main), selected

How can we further improve natural language processing pipelines?

Key takeaways:

  1. AI system design needs to move beyond manual prompt engineering and embrace data-driven optimization.
  2. The use of the DSPy library enables modular composable systems and data-driven optimization.
  3. The future of AI system design may lie in local models for reproducible research.

Christopher Potts, chair of linguistics and computer science professor at Stanford University, talks about how language models are enabling researchers to build Natural Language Processing (NLP) systems at higher levels of abstraction and with lower data requirements than ever before. However, these systems are being built around long, complex, hand-crafted prompt templates. This is akin to setting the weights of a classifier by hand rather than learning them from data. 

Toward a more systematic approach, Potts and his team introduce DSPy, a programming model that abstracts language model pipelines as imperative computation graphs where language models are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn how to apply compositions of prompting, finetuning, augmentation and reasoning techniques. The DSPy compiler will optimize any DSPy pipeline to maximize a given metric. Even quite simple DSPy programs, once compiled, routinely outperform standard pipelines with hand-created prompts and allow the development of performant systems using small language models. Potts says,

I think the future lies in us being able to get a lot of juice out of models that can run on our own devices inexpensively.

 

DSPy is an active, fully open-source tool for data-driven optimization in NLP tasks. It can be accessed here.

 

Looking towards the future of AI and machine learning

We’re looking forward to further collaborating with these AI experts on successfully deploying machine learning on devices around the world. Whether it’s optimizing NLP tasks by making them less data-hungry, running generative AI in a more memory-efficient manner, or scaling LLMs while reducing compute, there are exciting challenges to tackle in AI in 2024.

Explore more topics, insights and trends in on-device generative AI within our AI on the Edge series.

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

About the Author
Armina Stepan
Armina StepanSr. Marketing Comms Coordinator, Qualcomm Technologies Netherlands B.V.
Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.