NeurIPS 2020 Demo: Neural network quantization with AdaRound
Neural-network models can be very large and compute intensive, which can make them challenging to run on edge devices. Model quantization provides significant benefits in power and memory efficiency, as well as latency. Quantization of a 32-bit floating-point model to an 8-bit or 4-bit integer model often results in accuracy loss. Qualcomm AI Research introduces state-of-the-art quantization techniques that minimize bit precision while preserving model accuracy. In this demo, we present AdaRound, which stands for Adaptive Rounding, a post-training quantization technique that requires only minimal unlabeled data and no model fine-tuning while preserving model accuracy.
Publish Date: Dec 12, 2020