Jul 9, 2018
Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.
Artificial intelligence (AI) is increasingly enriching our lives in seamless ways, from superior photography and enhanced security to more natural user interfaces (UIs). At Qualcomm AI Research, power efficiency and personalization are core research areas since both are essential elements for making next-gen intelligent user experiences ubiquitous. In this post, I will focus on why on-device AI is required for a virtual assistant, with special emphasis on personalization being a major differentiation between today’s virtual assistants and a true personal assistant.
Transformative voice UIs are increasingly being powered by AI
Voice is the transformative user interface (UI) we’ve been waiting for since it provides a natural way to interact with devices hands-free. Advanced voice UIs are being designed to be always-on, conversational, personalized, and private so that we can have productive and natural conversations. Voice UI is proliferating across product categories, from smartphones and smart speakers to XR headsets and automotive.
There are many components required for the end-to-end voice UI experience (Figure 1). The voice UI sequence of events starts with the voice signal acquisition from the microphone(s), followed by speech pre-processing, voice activation, speech recognition, natural language understanding and finally speech synthesis where the device responds to you. As you can see, there are many steps and the process is complex.
Why is voice UI becoming so popular now? Machine learning has recently ignited the voice UI revolution as machine speech recognition approaches 95%, roughly the same as human accuracy. Below the human accuracy threshold, the user experience significantly degrades since the interface is no longer conversational, productive, or natural. Interestingly enough, machine learning techniques can be applied to just about all the voice UI components ꟷ more about this later.
On-device voice UI is key
The AI that is powering key voice UI components, such as automatic speech recognition and natural language processing, has traditionally run in the cloud due to computing, storage, and power constraints. However, on-device processing provides unique benefits, such as much faster response, increased reliability, and better privacy protection. For privacy in particular, people want to own their data and have a choice whether it is being processed or stored in the cloud. As a result, the trend is toward more processing on the device to complement the cloud capabilities. The key challenge is to run the complex voice UI functionality within the power and thermal envelope of a mobile device.
Let me walk through a couple examples of how we are addressing this challenge and applying low-power on-device AI to components of the voice UI — tune into my webinar for more details.
- Speech denoising takes noisy speech as input and outputs clean speech, which is especially essential in noisy environments. Qualcomm AI Research is developing a deep learning model for denoising, which provides significant improvements in performance over traditional methods, and is so efficient that it even works within the processing, storage, power, and thermal constraints of a hearing aide.
- Automatic speech recognition transcribes voice to text. Testing shows that deep learning can give state-of-the-art accuracy on a mobile device. Qualcomm AI Research has trained a deep learning acoustic model to adapt to each user’s accent and environment, adding personalization.
On-device voice UI is already a reality. Qualcomm AI Research has implemented the entire end-to-end voice UI processing on the device, and at CES 2018 showcased the experience through a smart home demo. The demo achieved >95% on-device intent accuracy for domain specific command sets when adapted to accent and environmental condition.
A true virtual assistant — adding personalization
A true virtual assistant is a “digital me” sitting on the device. It does personal profiling and learns preferences over time to act as a digital assistant that really knows you. Contextual intelligence is required for personalization and can be achieved by fusing multiple on-device sensor inputs, such as mics, camera, and accelerometers, as well as other on-device and off-device data (Figure 3).
The big payoff from this enhanced personalization is that it brings us closer to a true personal assistant that is both responsive and proactive. It would not only be able to provide customized answers when asked questions, but it would also automatically make decisions and start conversations without prompting based on contextual analysis.
For this level of personalization, Qualcomm AI Research is researching an on-device “AI agent” that continuously learns personal knowledge from all available sensor inputs in order to take intuitive actions. The AI agent includes key functionality like contextual fusion and learning that ultimately allows for personalized responses that enrich our lives. Initial research is showing promising results for on-device personalization. I’m excited to see how personalization will enhance everyday experiences not just for personal assistants but for broadly enriching our lives. This technology will truly transform industries and enable our customers to innovate.