Back to All
Project

Whisper ASR Model Compilation and Integration on Qualcomm® Robotics RB5

Whisper-Base-En Automatic Speech Recognition model is a state-of-the-art system designed for transcribing spoken language into written text. It exhibits superior performance in realistic, noisy environments, making it optimal for real-world applications. Specifically, it excels in long-form transcription, capable of accurately transcribing audio clips up to 30 seconds long.

Objective

The objective is to implement and evaluate Qualcomm® AI Hub’s Whisper ASR model on Qualcomm® Robotics RB5 platform to accurately transcribe spoken language into written text in real-world, noisy environments. The project aims to integrate the model into a user-friendly application, test its performance, optimize for enhanced accuracy, and demonstrate its practical use in various scenarios.

 

 

Equipment Description
Qualcomm Robotics RB5 Qualcomm Robotics RB5 Development Kit
Power adapter 12 V with 2500 mA required by the 96Boards specification
USB to Micro USB cable
For serial console interface , to view logs
USB to USB Type C cable For connecting the USB3.0 Type C port to the board and flashing images, adb and fastboot

Introduction:

  • Whisper-Base-En is an automatic speech recognition (ASR) model for English transcription as well as translation. It exhibits robust performance in realistic, noisy environments, making it optimal for real-world applications.
  • Identify any limitations or challenges encountered during the implementation and testing phases.
  • Explore potential optimizations to improve transcription accuracy and processing speed.
  • Investigate techniques to enhance the model's performance in particularly noisy or challenging environments.

Technical Details:

  • Model checkpoint: base.en
  • Input resolution:80x3000 (30 seconds audio)
  • Mean decoded sequence length:112 tokens
  • Number of parameters (WhisperEncoder):23.7M
  • Model size (WhisperEncoder):90.6 MB
  • Number of parameters (WhisperDecoder):48.6M
  • Model size (WhisperDecoder):186 MB

Whisper_Base_En Tflite Model Inference on Qualcomm Robotics RB5

Prerequisites:
  1. Ubuntu 20.04
  2. Conda Environment
  3. Python3.8
  4. TFLite
  5. Qualcomm Robotics RB5

Steps to Execute:

Step 1:Install Conda on wsl2 by using this link.

  • After Installation, create the conda environment by using the given commands.
$ conda create --name <env_name> python=3.8
  • To check the conda environment list
$ conda env list
  • To activate conda environment
$ conda activate “env_name”

Step 2: Install the required dependencies for Qualcomm Robotics RB5 to run the tflite inference.

  • Install TFLite Runtime Library on the Qualcomm Robotics RB5.
  • Install OpenCV-Python on the Qualcomm Robotics RB5.

Step 3: From this link, download the whisper_base.tflite model and save it in model directory

Step 4: Run Whisper-Base-En Tflite model inference with Recorded Audio file (audio.wav) as input on Qualcomm Robotics RB5.

$ python3 inference.py
Audio
Audio

Step 5: Run Whisper-Base-En Tflite model inference by taking audio input from external Mic on Qualcomm Robotics RB5 (instead of recorded Audio).

$ python3 inference_mic.py
Audio

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries ("Qualcomm"). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Project Authors
Capgemini
Avdhesh ChouhanProfessional II
Ankitha MAssociate l – Engineer

Sign up for the Developer Newsletter.

Get software and hardware tool resources to help optimize your development delivered to your inbox weekly.

Qualcomm relentlessly innovates to deliver intelligent computing everywhere, helping the world tackle some of its most important challenges. Our leading-edge AI, high performance, low-power computing, and unrivaled connectivity deliver proven solutions that transform major industries. At Qualcomm, we are engineering human progress.

Stay connected

Get the latest Qualcomm and industry information delivered to your inbox.

Subscribe
Manage your subscription

© Qualcomm Technologies, Inc. and/or its affiliated companies.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.

Note: Certain services and materials may require you to accept additional terms and conditions before accessing or using those items.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell or license any of the services or materials referenced herein.