Back to All
OnQ Blog

Multimodal AI is having its moment in the sun. Here’s why it’s so important

Multimodal AI takes in different inputs like text, images or video, allowing digital assistants to better understand the world and you, and gets supercharged when it’s able to run on your device

As smart as generative artificial intelligence (AI) can be, its capabilities are limited by how well it understands everything around it. That’s where large multimodal models (LMMs) come in, which allow AI to analyze voice queries, text, images, videos and even radio frequency and sensor data to provide more accurate and relevant answers.


It’s a critical part of the evolution of generative AI following the now popular Large Language Models (LLMs), such as the one behind the original version of ChatGPT, which were only able to handle text. That enhanced ability to understand what you see and hear will supercharge devices like your smartphone or PC, and make digital assistants and productivity apps much more useful. And being able to handle these operations on the device will make the process faster, more private and power efficient. 

Qualcomm Technologies is committed to enabling on-device multimodal AI. Back in February, we were the first to show off Large Language and Vision Assistant (LLaVA), a community-driven LMM with 7+ billion parameters, running on a Snapdragon 8 Gen 3 Mobile Platform-based Android phone for the first time. In this demo, the phone was able to “recognize” images like a dog in an open landscape or a platter of fruits and vegetables — and engage in a conversation. A person could request to create a recipe using the items from the platter, and even ask to estimate the total calories from the recipe. Check it out:

The future of AI is multimodal

This work is critical as the noise around multimodal has gotten louder. Last week, Microsoft introduced its Phi-3.5 family of models which include multi-lingual and visual support. This followed Google talking up LMMs at its Made by Google event, which included Gemini Nano, a model for multimodal inputs. In May, OpenAI introduced its own multimodal model with GPT-4 Omni. This follows similar work from Meta and community-developed models such as LLaVA. 

Taken together, these advances shine a light on the path that AI is heading down, one which goes beyond you typing out questions at a prompt. We are committed to bringing these AI experiences to billions of handsets around the world.

Qualcomm Technologies efforts include a wide array of companies developing LMMs and LLMs, including Meta’s Llama series, and is working with Google to enable the next generation of Gemini on Snapdragon. These models run smoothly on Snapdragon, and along with our partners, we are looking forward to delighting consumers with new on-device AI features throughout this and next year.

And while an Android phone is a natural starting point to take advantage of multimodal inputs, the benefits will quickly extend to other categories, from smart glasses able to scan what you eat and provide nutritional information, to cars being able to understand your voice commands and assist you on the road. 

 

Multimodal AI can tackle a lot of complex tasks for you

These are just the first steps for multimodal AI, which could help automobiles recognize those bored passengers in the back during a road trip and suggest fun activities to pass time, using a combination of cameras, microphones and vehicle sensors. It could also enable a pair of smart glasses to recognize gym equipment at a health club and create a customized workout plan for you.

The level of accuracy enabled by multimodal AI will be critical for assisting a field technician troubleshooting what is wrong with your appliances at home, or helping a farmer identify the cause of issues with the crop.

The idea is that these devices — starting with phones, PCs, cars and smart glasses — can take advantage of cameras, microphones and other sensors to let the AI assistant “see” and “hear” so it offers more useful contextual answers.

Importance of on device

All those extra capabilities work better if the AI operations happen on the device, meaning your phone or car needs to be powerful enough to handle those requests. Keeping things on your phone means that trillions of operations should run fast and efficiently, because the battery needs to last all day long. Doing things on the device means you don’t need to ping the cloud and wait for servers when they are too busy to respond. They’re also more private — your questions and the answers stay with you and your device.

That’s been a priority for Qualcomm Technologies. Its Snapdragon 8 Gen 3 processor with its Hexagon NPU enables handsets to handle much of the processing on the phone itself. Likewise, more than 20 Copilot+ PCs on the market today can handle sophisticated AI features on the device thanks to the Snapdragon X Elite and Snapdragon X Plus Platforms. 

And we are not standing still. The world of AI is evolving quickly, and your next best opportunity to see where it’s all going will be at Snapdragon Summit in October.

Explore more topics, insights and trends in on-device generative AI within our AI on the Edge series.


Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.

About the Author
The OnQ Team
The OnQ Team
Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.