bitsandbytes
PulseAugur coverage of bitsandbytes — every cluster mentioning bitsandbytes across labs, papers, and developer communities, ranked by signal.
4 天有情绪数据
-
Fixing local LLM OOM errors by optimizing KV cache and quantization
Running large open-source language models locally can lead to out-of-memory errors, even if the model's weights seem to fit within the available VRAM. This is primarily due to the significant memory required for the KV …
-
Quantization study enables smaller, more accurate Whisper-small ASR
A new study published on arXiv evaluates various post-training quantization (PTQ) techniques for the Whisper-small automatic speech recognition model. The research, which tested libraries like PyTorch, Optimum-Quanto, H…
-
4-bit quantization is the practical sweet spot for local LLMs
For most users running large language models locally, 4-bit quantization offers a practical balance between performance and quality, significantly reducing VRAM requirements compared to 8-bit. While 4-bit models may sho…
-
Developers fine-tune LLMs on 3GB GPUs using QLoRA
Developers can fine-tune large language models like TinyLlama on consumer hardware with as little as 3 GB of GPU memory using techniques such as QLoRA and NF4 quantization. This process involves training only a small fr…
-
Quantization impacts LLM factual recall, with varied effects across models and methods
A new paper investigates how quantization, a technique used to compress large language models, affects their ability to recall factual knowledge. Researchers found that while quantization generally leads to some informa…
-
Hugging Face introduces advanced quantization techniques for efficient LLMs
Researchers are developing advanced quantization techniques to make large language models (LLMs) more efficient. New methods like AutoRound, LATMiX, and GSQ aim to reduce model size and computational requirements, enabl…