Back to the Index

Model Quantization

What is model quantization?

Model quantization can reduce the memory footprint and computation requirements of deep neural network models. Weight quantization is a common quantization technique that converts a model’s weights from the standard floating-point data type (e.g., 32-bit floats) to a lower precision data type (e.g., 8-bit integers), thus saving memory and resulting in faster inference (through reduced computational complexity). Model quantization can make large models, such as LLMs, more practical for real-world applications at edge devices.

Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Privacy Policy
Cookie Policy
Terms and Conditions