Broadcasted residual learning for efficient keyword spotting
Byeonggeun Kim (formerly Qualcomm AI Research),
Simyung Chang (Qualcomm AI Research)
Jinkyu Lee (Qualcomm AI Research)
Dooyong Sung (formerly Qualcomm AI Research)
INTERSPEECH 2021
Summary
Keyword spotting is an important research field because it plays a key role in device wake-up and the user’s experience with smart devices. However, it is challenging to minimize errors while operating efficiently on low power edge devices. We present a broadcasted residual learning method to achieve high accuracy with small model size and computational load. Our method configures most of the residual functions as 1D temporal convolution while still allowing 2D convolution together using a broadcasted-residual connection that expands temporal output to frequency-temporal dimension. This residual mapping enables the network to effectively represent useful audio features with much less computation than conventional convolutional neural networks. We also propose a novel network architecture called Broadcasting-residual network (BC-ResNet) that enables the model to scale up according to the target device’s resources.
Citation
@inproceedings{kim21l_interspeech, author={Byeonggeun Kim and Simyung Chang and Jinkyu Lee and Dooyong Sung}, title={{Broadcasted Residual Learning for Efficient Keyword Spotting}}, year=2021, booktitle={Proc. Interspeech 2021}, pages={4538--4542}, doi={10.21437/Interspeech.2021-383} }
Results
Broadcasted residual learning repeatedly averages 2D features to 1D features and expands 1D features back to the 2D. Leveraging the broadcasted residual learning and simple scaling by width, we design a family of networks called BC-ResNets and surpass state-of-the-art on benchmark speech command datasets.
Looking for more papers with code?
* Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
