Keyword Speech Dataset
Keyword Speech Dataset
chip image

Keyword spotting (KWS) is widely used for detecting specific keywords in personal devices like mobile phones and home appliances. A keyword may consist of multiple words, where “Hey, Siri,” “OK, Google,” and “Hi, Bixby” are well-known examples.

Qualcomm Technologies, Inc. has published a keyword dataset for the Snapdragon® mobile platform. The Hey Snapdragon keyword dataset contains 4,270 audio clips of four English keyword classes spoken by 50 people.

ResourcesForum

The four keyword classes are:

 

“Hey, Android”

“Hi, Galaxy”

“Hi, Lumina”

“Hey, Snapdragon”

 

Find out more about the keyword speech dataset and see how you can use it in your research projects.

For any questions or technical support, please contact us at [email protected]

Snapdragon is a product of Qualcomm Technologies, Inc. and/or its subsidiaries.

Dataset details

 

Nowadays keyword spotting (KWS) is widely used for detecting specific keywords in personal devices like mobile phones and home appliances. A keyword may consist of multiple words, where “Hey Siri”, “Ok Google”, and “Hi Bixby” are well known examples.
 

Many keywords like these examples are branded by specific companies and the companies have shown great interests in KWS task for their own products. Various KWS approaches have been suggested by these companies, but they have exclusivity since they use their own keyword dataset that are not accessible to others. Therefore, the approaches are not reproducible by others and hard to compare between each other.
 

To handle the issue, here we publish a keyword dataset for our Snapdragon® mobile platform we have named ‘Hey Snapdragon Keyword Dataset’.
 

The ‘Hey Snapdragon Keyword Dataset’ was used to support experimental results in our paper at ASRU 2019: Query-by-example on-device keyword spotting. We hope that this new dataset will be helpful for reproducible KWS researches.

Dataset composition

 

The dataset has 4,270 utterances of four English keywords spoken by 50 people. The four keywords are Hey Android, Hey Snapdragon, Hi Galaxy and Hi Lumina. The following table shows the details for each keyword. Each wav file has been recorded with the sampling rate of 16 kHz, mono channel, and 16 bits bit-depth.

Keyword # of Speakers # of utterances
Hey Android 50 1112
Hey Snapdragon 50 1112
Hi Galaxy 200 934
Hi Lumina 50 1112

Dataset license

 

The Qualcomm Keyword Speech Dataset is available for research purposes.

 

Data License Agreement - Research Use

Dataset Citation Instructions


The dataset is intended for research purposes only. Please cite our paper if you use this dataset in your research:

@misc{1910.05171,

Author = {Byeonggeun Kim and Mingu Lee and Jinkyu Lee and Yeonseok Kim and Kyuwoong Hwang},

Title = {Query-by-example on-device keyword spotting},

Year = {2019},

Eprint = {arXiv:1910.05171},

}

Qualcomm AI Research


At Qualcomm AI Research, we are advancing AI to make its core capabilities – perception, reasoning, and action – ubiquitous across devices. Our mission is to make breakthroughs in fundamental AI research and scale them across industries. By bringing together some of the best minds in the field, we’re pushing the boundaries of what’s possible and shaping the future of AI.

 

Qualcomm AI Research continues to invest in and support deep-learning research in computer vision. The publication of the CausalCircuit dataset for use by the AI research community is one of our many initiatives.

 

Find out more about Qualcomm AI Research.
For any questions or technical support, please contact us at 
[email protected]

 

Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.

Connect with our communities

Stay ahead of the curve

Receive the latest updates, exclusive offers, and valuable insights delivered through the Qualcomm newsletter straight to your inbox.

Stay ahead of the curve

Receive the latest updates, exclusive offers, and valuable insights delivered through the Qualcomm newsletter straight to your inbox.

Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.