Brief

last 24h

[3/3] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 1d

Accelerating Long-Tail Generation in Synchronous RLHF Training via Adaptive Tensor Parallelism

Researchers have developed a new method called PAT to accelerate the training of Reinforcement Learning from Human Feedback (RLHF) models. This technique dynamically adjusts tensor parallelism during the generation stage, addressing the issue of long response times bottlenecking the process. By intelligently reconfiguring parallelism and managing decoding states, PAT has demonstrated significant reductions in both generation and end-to-end training latency for models like LLaMA3.1-8B and Qwen3-14B. AI

IMPACT Accelerates RLHF training, potentially enabling faster iteration and deployment of aligned AI models.
- Qwen3-14B
- SGLang
- RLHF
- LLaMA3.1-8B
- PAT
- DeepScaleR
RESEARCH · arXiv cs.LG English(EN) · 5d · [2 sources]

Atom-anchored LLMs speak Chemistry: A Retrosynthesis Demonstration

Researchers have developed new benchmarks and methods to evaluate and enhance Large Language Models (LLMs) for chemistry-related tasks. One approach, Speak-to-Structure (S^2-Bench), focuses on open-domain molecule generation, moving beyond simple one-to-one mappings to assess creative and diverse molecular design capabilities. Another method introduces atom-anchored LLMs that use unique atomic identifiers to anchor chain-of-thought reasoning for molecular transformations, achieving high success rates in tasks like retrosynthesis without requiring task-specific training. AI

IMPACT New benchmarks and methods are emerging to push LLMs towards more complex scientific reasoning in chemistry.
RESEARCH · arXiv cs.CL English(EN) · 1mo · [16 sources]

Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing

A new study published on arXiv investigated the hallucination tendencies of four popular LLMs—ChatGPT, Grok, Gemini, and Copilot—when used for academic writing. The research introduced a "Hallucination Index" (HI) and found that Grok and Copilot performed better in reference generation but struggled with abstract prompts, while Gemini and ChatGPT showed better tone control but higher factual hallucination risks. The study concluded that hallucination behavior is influenced by task type and prompting conditions, not solely by model architecture. Separately, Gary Marcus highlighted multiple studies indicating that current LLMs are unreliable for medical advice, often providing inaccurate or fabricated information with high confidence, and should not be used for unsupervised clinical decision-making. AI

IMPACT LLM hallucinations in academic and medical contexts pose risks of misinformation and unreliable decision-making, highlighting the need for caution and further research.
- SQLite
- Ollama
- Claude
- DeepSeek
- Copilot
- llama3.1:8b
- Glia
- Eshaan Nair
- Cursor
- Grok
- ChatGPT
- Gemini
- Nvidia
- Palantir
- Gary Marcus
- CoreWeave
- arXiv
- Large Language Models
- Nature Medicine
- JAMA Network Open

Brief

Accelerating Long-Tail Generation in Synchronous RLHF Training via Adaptive Tensor Parallelism

Atom-anchored LLMs speak Chemistry: A Retrosynthesis Demonstration

Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing