Brief

last 24h

[2/2] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.LG English(EN) · 4d

MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels

Researchers have developed MemReward, a novel graph-based framework designed to improve reinforcement learning for large language models (LLMs) when labeled data is scarce. This method uses a graph neural network (GNN) to propagate reward signals from a small set of labeled examples to a larger pool of unlabeled data. Experiments show that MemReward can achieve performance close to that of an oracle (fully labeled data) even with only 20% of the data labeled, demonstrating its effectiveness across various tasks like mathematics, question answering, and code generation. AI

IMPACT Enables more efficient fine-tuning of LLMs in data-scarce environments, potentially accelerating development across various AI applications.
RESEARCH · arXiv cs.CL English(EN) · 5d · [2 sources]

Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)

Researchers have developed a method using singular value decomposition (SVD) of a large language model's weight matrix to reveal interpretable semantic subspaces. This technique, requiring minimal code and no model inference, can expose the composition and curation of a model's training data. The analysis of models like GPT-OSS-120B, Gemma-2-2B, and Qwen2.5-1.5B showed systematic differences in their learned subspaces, with Qwen exhibiting ethically inappropriate vocabulary. The study proposes this SVD analysis as a standard pre-release safety auditing step and suggests its use for tokenizer optimization and more controllable LLM design. AI

IMPACT Offers a novel, low-overhead method for auditing LLM training data and identifying potential ethical risks before deployment.

Brief

MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels

Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)