Oct 15, 2015
Qualcomm products mentioned within this post are offered by Qualcomm Technologies, Inc. and/or its subsidiaries.
Ever since Qualcomm Technologies first announced always-on device activation in early 2013, we have seen an industry-wide adoption of always-on voice services. These services rely on waking up the device using a preset phrase such as “Hey, Snapdragon,” and then sending raw speech to the cloud, where actual recognition is done and the response formulated.
This raises a number of legitimate privacy questions:
- How long is the voice recording stored in the cloud?
- Is the recorded voice available to a human listener, such as a software developer, who needs to use it for legitimate work on the application?
- Are there sufficient safeguards to prevent somebody from tracing the recorded voice back to a specific user?
- Apart from the user saying the wake-up phrase, is any background audio also stored in the cloud and how is such data used?
- Is the voice data encrypted in the cloud?
Several companies have been trying to respond to these and similar questions of privacy. I believe that performing speech recognition and response formation on the device instead of running them in the cloud is an option toward mitigating privacy and security concerns.
The Qualcomm Hexagon DSP SDK is designed to allow OEMs and ISVs to write fully capable, low-power speech recognition and synthesis applications that obviate the need to send voice samples to the cloud. Registered developers for the Hexagon SDK also get access to a text-to-speech framework to accelerate application development. This opportunity provides OEMs and ISVs with the option to perform some speech recognition on the device, and not need the cloud at all.
Some of the most common queries can be answered by information available on the device itself, without the need for network connectivity. For instance, queries about personal schedule can be answered by accessing the on-device calendar. The Hexagon SDK also provides capabilities to change many spoken queries into text before sending them over the network. This provides a privacy benefit of not collecting a recorded voice in the cloud.
Thoughts? Let me know what you think in the comments below.