Brief

last 24h

[2/2] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 1w

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Researchers have developed a new framework called Zero-Expert Self-Distillation Adaptation (ZEDA) to make existing Mixture-of-Experts (MoE) language models more efficient. ZEDA allows post-trained static MoE models to dynamically skip over half of their experts during inference with minimal accuracy loss. This method was tested on Qwen3-30B-A3B and GLM-4.7-Flash models, demonstrating significant inference speedups and outperforming existing dynamic MoE baselines. AI

IMPACT Enables significant inference speedups for MoE models, potentially lowering serving costs and increasing accessibility.
COMMENTARY · r/LocalLLaMA English(EN) · 12h

Is Qwen3.6 current king for local agentic use?

A user on Reddit's r/LocalLLaMA community is seeking feedback on the performance of the Qwen3.6 35B A3B model for local agentic tasks. They report that Qwen3.6 performs exceptionally well, outperforming models like Gemma4 and GLM 4.7 Flash in terms of avoiding loops and producing accurate tool calls. The user is looking for alternative Mixture-of-Experts (MoE) models of similar size that might offer comparable or superior performance for applications like Hermes Agent and Pi. AI

IMPACT Highlights user experiences with local LLMs, guiding others on model selection for agentic tasks.
- Pi
- Unsloth
- Hermes Agent
- Qwen3.6 35B A3B
- GLM 4.7 Flash
- Gemma4