OnQ Blog

How can Qualcomm Aqstic support dynamic voice UI experiences? [video]

Cutting-edge audio system teams with Qualcomm Snapdragon 845 Mobile Platform to support popular voice-driven smart assistants, including Amazon Alexa and Google Assistant

How often do you access your smartphone’s voice UI? Maybe you’ve used it recently to check the weather or get driving directions. Your smartphone interactions may become more intuitive and increase in number– as early as next year – as Gartner predicts that 20 percent of all smartphone interactions will occur via voice UI, also known as virtual personal assistants (VPAs). And according to IHS Markit, the number of AI-powered digital assistants in devices will surge from 4 billion last year to more than 7 billion by 2020, with smartphones driving this growth.

Last December, Qualcomm Technologies announced the Snapdragon 845 Mobile Platform, engineered with cutting-edge AI processing. It supports voice UI and can learn and recognize your voice as it boosts the accuracy of always-on keyword and voice-assistant commands.

In addition to the platform, there’s another technology at work behind the scenes: the Qualcomm Aqstic audio codec. The codec is purpose-built to work hand-in-hand with Snapdragon 845 to engineer amazing voice UI experiences. Our audio codec is also a true audio powerhouse, designed to support high-resolution standards that recording studios use to master tracks, creating a listening experience designed to appeal to audiophiles.

Three components, one solution, high-fidelity sound

Interacting with your mobile device has never been easier thanks to the Qualcomm Aqstic audio codec’s low-powered, always-on digital signal processor (DSP) and Snapdragon voice UI technology. Working together, these two systems are engineered to support voice-driven smart assistants, including Google Assistant, Amazon Alexa, and Duer OS.

Our codec was also created to deliver the best audio experience possible. It’s equipped with an analog-to-digital converter (ADC), as well as a Hi-Fi-grade digital-to-analog converter (DAC) that supports native direct stream digital (DSD) and pulse code modulation (PCM) up to 384kHz/32-bit playback. This level of playback is a historical first for our codec. All of this gives Snapdragon 845 the ability to deliver eight times the resolution of CD sound, which is 44.1 kHz. How excellent is this sound quality? These are high-resolution standards that major recording artists use to create their albums, producing a pure listening experience that can satisfy even the most sophisticated ears. The ADC and DAC components are also instrumental for supporting cutting-edge voice UI.

How AI works and a look inside our voice UI system

In the past, to activate your smartphone’s voice UI, you needed to pick it up, unlock it, and press a button to give it a command. It couldn’t be always-on because that would drain its battery life. Now, because of our codec and new voice UI algorithms, your device can remain on all-day standby, sipping very little battery life.

How does it all work? The Qualcomm Aqstic audio codec is virtually always on and has been trained to listen for certain keywords. For example, it uses the microphones within your phone to detect a question you might ask such as, “Alexa, what’s the weather like today?” Within the codec, the ADC detects your words, converts them to digital, and forwards this information to the audio codec DSP. The DSP has algorithms that perform keyword detection, recognizing keywords like, “Alexa” and “weather.”

The audio codec DSP then loops into the Snapdragon 845’s sophisticated low-power audio subsystem (LPASS), which includes a Qualcomm Hexagon 685 Scalar Processor, architecturally designed for audio uses. LPASS is the heart of the audio and voice experience on your Snapdragon-powered phone, playing a vital part in voice UI processing. In this role, LPASS performs voice verification to confirm your identity. Why’s this so important? It’s designed to prevent others from using their voices to buy things off the Internet, or otherwise access your device without permission.

After LPASS validates your identity, it’s designed to immediately beamform (or aim) your smartphone’s microphones toward you and automatically complete tasks to help the voice interaction run smoothly. It’s also engineered to perform echo cancellation, noise suppression, and isolates voice signals to better distinguish your voice, even in noisy environments.

LPASS is also equipped with a new feature called, “barge-in”, to boost the intelligence of your Qualcomm Aqstic voice UI. Barge-in is designed to allow your device to listen for your voice and automatically cancel out any audio that your device’s speaker (using its speaker amplifier) is playing. So, it doesn’t matter if you’re playing music or watching videos online, it easily detects your voice. As a result, your device hears your voice commands loud and clear, with no interference from any other audio emanating from your device.

Next, the LPASS and Kryo CPU on Snapdragon 845 are engineered to perform local or embedded audio speech recognition (ASR) and natural language processing (NLP) to generate a “speech-to-text” conversion, allowing your device to understand your question or command and respond accurately with “text-to-speech”. The LPASS and Kryo CPU can support small or medium-sized dialog or interactions on device and doesn’t need to access the cloud for the answer.

The LPASS and Kryo CPU also include specialized hardware for high-quality audio processing, which is an area of circuitry that serves as a bridge to the digital-to-analog converter (DAC) on the audio codec. As the LPASS arrives at your answer, it’s engineered to direct it to the DAC, which converts the signal back to analog and pipes it to the output speaker for you to hear. Now, imagine these heavily engineered processes happening in just a fraction of a second to answer your voice query, essentially “in the moment.”

As the demand for voice UI grows, we keep pace with the Qualcomm Aqstic audio codec coupled with Snapdragon 845. Together, these systems are engineered to support zero-touch UI and highly intuitive smartphone interactions to complete increasingly complex tasks. And looking forward to the future, as you stream hi-fi music through our platform, you’ll soon be doing that upon a 5G backbone, with the ability to downloading tunes in virtually the blink of an eye.

Qualcomm Snapdragon, Qualcomm Aqstic and Qualcomm Hexagon are products of Qualcomm Technologies, Inc. and/or its subsidiaries.