Understanding and Overcoming the Challenges of Efficient Transformer Quantization
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort (Qualcomm AI Research)
EMNLP 2021
Summary
An AI transformer is an architecture developed in order to solve certain problems in language understanding such as translation, text classification etc. The paper focuses its attention on the feasibility of implementing transformers on low-power edge devices (e.g. phone). Having the transformer run on the device instead of the cloud has advantages in security and doesn’t rely on internet, so it’s a worthwhile problem to solve. With the quantization methods described in the paper, the transformer can be made more efficient to perform well on device.
Citation
@inproceedings{bondarenko-etal-2021-understanding, title = "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", author = "Bondarenko, Yelysei and Nagel, Markus and Blankevoort, Tijmen", booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2021" }
Results
The researchers demonstrate the effectiveness of their methods on the benchmark dataset, establishing state-of-the-art results for post-training quantization. Finally, they show that transformer weights and embeddings can be quantized to ultra-low bit-widths, leading to significant memory savings with a minimum accuracy loss.
Looking for more papers with code?
* Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
