Qwen3-1.7B
PulseAugur coverage of Qwen3-1.7B — every cluster mentioning Qwen3-1.7B across labs, papers, and developer communities, ranked by signal.
2 天有情绪数据
-
Reinforcement learning trains small models for text-to-SPARQL generation
Researchers have explored using reinforcement learning to train smaller language models for zero-shot Text-to-SPARQL generation, a task crucial for knowledge graph question answering. They applied Group-Relative Policy …
-
Clinical AI fine-tuned on AMD hardware, bypassing CUDA dependency
A project has successfully fine-tuned a clinical AI model, MedQA, using AMD hardware and ROCm, demonstrating that advanced AI development is possible without NVIDIA's CUDA. The fine-tuning process utilized the Qwen3-1.7…
-
New S-trace method improves RLVR efficiency and credit assignment
Researchers have introduced Selective Eligibility Traces (S-trace), a novel method designed to enhance the reasoning capabilities of large language models within the Reinforcement Learning with Verifiable Rewards (RLVR)…
-
New methods enhance on-policy distillation for LLMs
Researchers have developed new methods to improve the efficiency and stability of on-policy distillation (OPD) for large language models. One approach, vOPD, uses a control variate baseline derived from the reverse KL d…
-
新的平衡聚合方法改进了 LLM 的 GRPO 训练
研究人员已识别并提出了 GRPO 风格训练中聚合偏差的解决方案,这是一种用于增强大型语言模型推理和代码生成的方法。研究表明,标准的 GRPO 聚合方法,即序列聚合和标记聚合,会引入不同的优化偏差。为了对抗这种偏差,他们引入了平衡聚合(BA),这是一种即插即用的替代方案,可提高训练稳定性和性能。使用 Qwen2.5-Math-7B 和 Qwen3-1.7B 模型进行的实验证明了 BA 在各种推理和编码基准测试中的有效性。