Mar 26, 2020
Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.
Technology made it possible to connect with almost anyone, anywhere — instantly from the palm of our hands. Friends, family, and businesses are no longer bound to distance and time; we’re able to communicate with ease. But one barrier remained, until now: language.
Even though we can message, call, or video chat across the world, we haven’t been able to seamlessly talk with those who speak other languages. Youdao, a search engine by Chinese internet company NetEase, however, is using artificial intelligence (AI) to solve this. For the past year and a half, we’ve worked with Youdao to power real-time mobile translation that can allow you to have a
Picture this: you’re on a work conference call with your clients in China. You’re speaking English, but they hear your words in perfect Mandarin right as you say them, and vice versa. It’s so seamless that it feels like magic. This type of translation was too intense for previous technologies, but with the Qualcomm Snapdragon 865 5G Mobile Platform and integrated 5th Generation Qualcomm AI Engine, which packs 2x the AI performance of its predecessor, it’s feasible. In fact, we demoed part of this use case live on stage during our Snapdragon Tech Summit in December 2019.
How it works
Translation functions typically occur in the CPU, a general performance block that isn’t too powerful or useful for specific applications. We are working with Youdao to bring translation onto more appropriate processing blocks — specifically, the Qualcomm Hexagon digital signal processor — to optimize end-to-end latency and performance. This reduces the time it takes to run each step of translation, so it can happen in real-time.
Whether you’re making a voice call or a Voice over Internet Protocol call (VoIP), Youdao’s real-time translation begins when your speech enters the microphone of a Snapdragon 865-powered device and travels to the 5th generation Qualcomm AI Engine inside the mobile platform. There, it wakes up the Qualcomm Sensing Hub to remove noise and echo then launch the Hexagon processor, where the three main stages of neural network processing occur:
- Automatic Speech Recognition (ASR) - Your English speech is transcribed into English text, using convolutional neural networks (CNN) running on the Hexagon processor.
- Neural Machine Translation (NMT) - The English text is then translated into Mandarin text, for example. This isn’t just word-for-word translation, but translation at the construct level. The Hexagon processor understands that sentences are constructed differently in the two languages, and that words can have different meanings in different contexts.
- Text-to-speech - Lastly, the Mandarin text is converted into Mandarin speech.
The best part is that all of this is happening on the device and in real-time, making cross-language conversation truly effortless.
The future of real-time translation
Youdao’s real-time translation technology currently supports English, Mandarin and many other languages.
With 5G, real-time translation in general can come to life in even more immersive ways. For example, the ultra-low latency enabled by this next generation of wireless and Snapdragon 5G devices could allow face mapping over video chat, providing realistic lip sync to make it truly look — not just sound — as if the other person is speaking your language.
At Qualcomm Technologies, we want to help the world connect, compute, and communicate. It’s why we’re especially excited to be working with Youdao and create technologies that can enable us all to overcome language barriers. Together with our partners, we’re excited to enrich user experiences and get the world talking.