SplitQuantV2: Enhancing Low-Bit Quantization of LLMs Without GPUs
(arxiv.org)
The quantization of large language models (LLMs) is crucial for deploying them on devices with limited computational resources.
The quantization of large language models (LLMs) is crucial for deploying them on devices with limited computational resources.