Skip to main content


1-bit LLM variant matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption.

#machinelearning

https://arxiv.org/abs/2402.17764

This website uses cookies to recognize revisiting and logged in users. You accept the usage of these cookies by continue browsing this website.