NeurIPS 2020 Demo: Neural network quantization with AdaRound

Neural-network models can be very large and compute intensive, which can make them challenging to run on edge devices. Model quantization provides significant benefits in power and memory efficiency, as well as latency. Quantization of a 32-bit floating-point model to an 8-bit or 4-bit integer model often results in accuracy loss. Qualcomm AI Research introduces state-of-the-art quantization techniques that minimize bit precision while preserving model accuracy. In this demo, we present AdaRound, which stands for Adaptive Rounding, a post-training quantization technique that requires only minimal unlabeled data and no model fine-tuning while preserving model accuracy.

Publish Date: Dec 12, 2020

Length: 4:04

©2021 Qualcomm Technologies, Inc. and/or its affiliated companies.

References to "Qualcomm" may mean Qualcomm Incorporated, or subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes Qualcomm's licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm's engineering, research and development functions, and substantially all of its products and services businesses. Qualcomm products referenced on this page are products of Qualcomm Technologies, Inc. and/or its subsidiaries.

Materials that are as of a specific date, including but not limited to press releases, presentations, blog posts and webcasts, may have been superseded by subsequent events or disclosures.

Nothing in these materials is an offer to sell any of the components or devices referenced herein.