llama-server
PulseAugur coverage of llama-server — every cluster mentioning llama-server across labs, papers, and developer communities, ranked by signal.
1 天有情绪数据
-
LocalLLaMA 用户寻求将 MTP 集成到 llama-bench
r/LocalLLaMA 子版块的用户正在寻求解决方案,将 llama-bench 与 MTP 集成,因为适用于 llama-server 的标准方法似乎不起作用。核心问题似乎是兼容性,有人猜测 llama-bench 可能不支持投机解码。
-
LocalLLaMA 用户讨论本地 LLM 的首选前端
r/LocalLLaMA 子版块的用户正在讨论他们与本地大型语言模型交互的首选前端。一位用户分享了他们使用 Vim 和自定义文本补全插件的非传统设置,同时也指出了 llama-server 存在的局限性。本次讨论旨在收集社区用于本地 LLM 部署和使用的工具和界面的见解。
-
Quantized Qwen3.6-27B model achieves 100k context on 16GB VRAM
A user on Reddit's r/LocalLLaMA has detailed a method for running the Qwen3.6-27B model on a system with 16GB of VRAM, achieving a context length of 100,000 tokens. The process involves creating a custom GGUF quantizati…
-
Qwen3.6-27B model offers flagship coding performance in a smaller package
Qwen has released Qwen3.6-27B, an open-weight model that reportedly matches flagship-level coding performance. This new model significantly outperforms its predecessor, Qwen3.5-397B-A17B, while being substantially small…