Brief

last 24h

[37/287] 186 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI · 1d

Pareto-Enhanced Portrait Generation: Vision-Aligned Text Supervision for Alignment, Realism, and Aesthetics

Researchers have developed a new method to improve text-to-image diffusion models for generating human portraits, addressing the common trade-off between text alignment, realism, and aesthetics. Their approach uses a feature supervision paradigm with a lightweight cross-modal alignment mechanism that extracts vision-aligned text representations from SigLIP 2. This method injects guidance into the image generation process without degrading the model's original capabilities or requiring extra inference time, while also optimizing for human-perceived aesthetics. AI

IMPACT Introduces a novel technique to improve the quality and coherence of AI-generated portraits, potentially impacting creative tools and applications.
- MM-DiT
- SigLIP 2
TOOL · arXiv cs.CL · 1d

ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

Researchers have developed ChunkFT, a novel framework designed to significantly reduce the memory required for full-parameter fine-tuning of large language models. This method dynamically activates a working set of parameters, enabling gradient computation on sub-tensors without altering the model architecture. Experiments show ChunkFT can fine-tune models like Llama 3-8B on a single consumer GPU, achieving performance comparable to traditional full fine-tuning while using substantially less memory. AI

IMPACT Enables fine-tuning of large language models on consumer hardware, potentially democratizing advanced model customization.
TOOL · arXiv cs.CV · 1d

FTerViT: Fully Ternary Vision Transformer

Researchers have developed FTerViT, a fully ternary Vision Transformer that compresses all weight matrices and normalization parameters. This approach significantly reduces the model's memory footprint, making it more feasible for deployment on resource-constrained devices like microcontrollers. FTerViT achieves competitive accuracy on ImageNet while offering substantial compression compared to standard floating-point models. AI

IMPACT Enables more efficient deployment of advanced vision models on low-power edge devices.
TOOL · arXiv cs.AI · 1d

ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving

Researchers have developed ScenePilot, a new framework for generating critical scenarios for autonomous driving systems. This method focuses on creating scenarios that are physically solvable but still challenging enough to cause failures in deployed systems. By using constrained reinforcement learning and a combination of physical feasibility scores and risk prediction, ScenePilot aims to produce more realistic and effective stress tests for autonomous vehicles. Experiments show that scenarios generated by ScenePilot lead to higher collision rates while maintaining physical validity, and fine-tuning on these scenarios reduces downstream crash rates. AI

IMPACT Enhances safety testing for autonomous vehicles by generating more realistic and challenging failure scenarios.
TOOL · arXiv cs.CL · 1d

Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation

Researchers have developed DPR-BAG, a novel framework designed to generate biomedical abstracts from full-text articles that lack them. This training-free, zero-shot approach structures the document into rhetorical facets like Background, Objective, Methods, Results, and Conclusions. It then uses large language models to summarize each facet individually before a final refinement step ensures overall coherence and factual accuracy. AI

IMPACT This framework could improve accessibility and utility of biomedical literature by enabling abstract generation for articles that currently lack them.
TOOL · arXiv cs.AI · 1d

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Researchers have developed new methods to optimize agent-based plan-execute pipelines for industrial operations, which are highly sensitive to latency. They introduced a temporal semantic cache and workflow optimizations, including disk-backed tool discovery caching and parallel step execution. These optimizations achieved significant speedups, with workflow optimizations providing a 1.67x speedup and temporal caching yielding up to 30.6x speedup on cache hits, while also highlighting limitations of standard semantic caching for parameter-rich queries. AI

IMPACT Introduces optimizations for latency-sensitive industrial AI agent pipelines, potentially improving efficiency in real-world applications.
- LLM
- AssetOpsBench
TOOL · arXiv cs.AI · 1d

Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task

Researchers from the University of Florida developed a two-stage pipeline for cultural image captioning in Indigenous languages, winning the AmericasNLP 2026 shared task. The system first generates an intermediate Spanish caption using Qwen2.5-VL, then translates it into the target Indigenous language with Gemini 2.5 Flash via retrieval-augmented prompting. This approach yielded significant improvements over the baseline, with gains exceeding 150% for some languages, though retrieval effectiveness was found to be language-dependent. AI

IMPACT Demonstrates a novel approach to low-resource language translation for image captioning, potentially improving accessibility for Indigenous communities.
TOOL · arXiv cs.LG · 1d

Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning

Researchers have developed FISolver, a novel LLM-based system designed to discover first integrals in dynamical systems, which are crucial for understanding conservation laws. The system addresses data scarcity by employing a "Backward Generation" algorithm to create extensive datasets of differential equation and first integral pairs. FISolver also utilizes supervised fine-tuning and reinforcement learning with a shaped reward to enhance its performance, outperforming larger models and commercial solvers like Mathematica on challenging benchmarks with lower computational costs. AI

IMPACT Introduces a novel data-driven approach for automated scientific discovery, potentially accelerating research in dynamical systems.
- LLM
- Mathematica
- FISolver
TOOL · arXiv cs.AI · 1d

COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space

Researchers have developed COAgents, a novel multi-agent framework designed to tackle complex Vehicle Routing Problems (VRPs). This framework models the search process as a graph, dynamically constructing a Partial Search Graph (PSG) to guide exploration. COAgents trains agents for node selection, move selection, and strategic 'jumps' to escape local minima, separating general search control from domain-specific encoding for adaptability. Experiments demonstrate COAgents' competitiveness, setting a new state-of-the-art among learning-based methods on VRPTW instances and significantly closing the gap to optimal solutions. AI

IMPACT Introduces a novel multi-agent learning approach that improves performance on challenging routing optimization tasks.
TOOL · arXiv cs.CL · 1d

HRM-Text: Efficient Pretraining Beyond Scaling

Researchers have developed HRM-Text, a novel Hierarchical Recurrent Model that significantly reduces the computational resources and training data required for pretraining large language models. By decoupling computation into strategic and execution layers and training exclusively on instruction-response pairs, a 1B-parameter model achieved competitive performance on several benchmarks with a fraction of the tokens and compute used by standard models. This approach makes foundational LLM research more accessible by lowering the barrier to entry for pretraining from scratch. AI

IMPACT Enables more researchers to train foundational models from scratch, potentially accelerating innovation.
TOOL · arXiv cs.AI · 1d

Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

Researchers have developed new methods to understand the internal workings of Mixture-of-Experts (MoE) models in computer vision. By analyzing how different visual categories are routed to specific experts and examining the tuning of these experts to various inputs, they found that an animate-inanimate distinction is a dominant factor in expert partitioning. The study reveals that experts tune to broader, continuous visual and semantic dimensions beyond simple category boundaries, highlighting the benefits of moving beyond basic routing analyses for a deeper understanding of MoE specialization. AI

IMPACT Provides novel methods for interpreting the specialized functions within complex vision models, advancing AI research.
TOOL · arXiv cs.CL · 1d

Self-Training Doesn't Flatten Language -- It Restructures It: Surface Markers Amplify While Deep Syntax Dies

A new research paper proposes the Structural Depth Hypothesis (SDH) to explain how self-training restructures language models. The study found that while surface-level linguistic features like discourse markers increase, deeper syntactic structures such as questions and passives decline. This effect was observed across multiple models and architectures, suggesting it's a specific outcome of self-training rather than a general language model behavior. AI

IMPACT This research suggests that self-training may lead to LLMs that are superficially complex but lack deep syntactic understanding, impacting data curation and text detection.
TOOL · arXiv cs.LG · 1d

Instant GPU Efficiency Visibility at Fleet Scale

Researchers have developed a new metric called Overall FLOP Utilization (OFU) to measure GPU efficiency for AI workloads. OFU is derived from on-chip performance counters and does not require application instrumentation, making it applicable across different GPU generations and precisions. When tested on production training jobs, OFU showed a strong correlation with application-level metrics and helped identify efficiency regressions and framework miscalculations. AI

IMPACT Provides a practical method for monitoring and improving the efficiency of AI training infrastructure.
- GB200
- Overall FLOP Utilization (OFU)
TOOL · arXiv cs.CL · 1d

Direct Translation between Sign Languages

Researchers have developed a novel method for direct translation between different sign languages, addressing a gap in current sign language technology. Their approach utilizes back-translation to create synthetic parallel corpora, enabling the training of a single model for both text-to-sign and sign-to-sign translation. This direct method significantly outperforms cascaded systems in accuracy and speed, showing promise for improved cross-lingual communication among deaf and hard-of-hearing individuals. AI

IMPACT Enables cross-lingual communication for 1.5 billion deaf and hard-of-hearing individuals by directly translating between sign languages.
TOOL · arXiv cs.CL · 1d

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

A new study published on arXiv assessed 6,233 web-deployed medical large language models (LLMs), evaluating a sample of 1,500 along with 10 open-source models. The research found that a significant portion of these models exhibit factual inaccuracies, with 25-30% showing low accuracy and over half violating operational thresholds. Additionally, many action-enabled models lacked adequate privacy disclosures, indicating systemic gaps in safety and compliance. AI

IMPACT Highlights critical safety and compliance issues in medical AI, necessitating stronger safeguards for patient care.
- LLMs
- arXiv
- Sunday Ogundoyin
- MedGPTs
- HAA-MedGPT
TOOL · arXiv cs.CL · 1d

Multi-agent Collaboration with State Management

Researchers have developed STORM, a novel state-oriented management system designed to improve collaboration among multiple AI agents working on shared codebases. Unlike existing methods that rely on workspace isolation and delayed conflict resolution, STORM actively manages agent states to ensure consistent views and detect conflicts in real-time during edits. Evaluations on the Commit0 and PaperBench benchmarks demonstrated that STORM significantly outperforms baseline methods, achieving higher scores and comparable cost efficiency across various large language models. AI

IMPACT Improves efficiency and reduces conflicts for AI agents working collaboratively on software development tasks.
- AI agents
- LLMs
- STORM
- Commit0
- codebases
TOOL · arXiv cs.CL · 1d

When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology

A new research paper analyzes neural morphological generation systems, revealing that a tiny fraction of rare, irregular data can disproportionately cause errors. The study focused on Japanese past-tense verb inflection, finding that a specific irregular subtype, less than 1% of the data, was responsible for a significant share of model mistakes. This suggests that not all irregularity equally destabilizes models, and finer-grained subclass analysis is needed for better morphological evaluation. AI

IMPACT Highlights the need for more granular evaluation of AI models beyond aggregate accuracy, particularly in language processing tasks.
- Japanese past-tense verb inflection
- Neural Morphology
TOOL · arXiv cs.CL · 1d

What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework

Researchers have developed a new framework to analyze the properties of annotated corpora used in biomedical Named Entity Recognition (NER) and Entity Linking (EL) benchmarks. This corpus-centric approach systematically examines statistics related to scale, label distribution, lexical structure, train-test overlap, and metadata composition. Applying this framework to nine different corpora revealed significant variations in their properties, suggesting that standard corpus statistics may not fully capture what these benchmarks evaluate. AI

IMPACT Provides a standardized method for evaluating the quality and comparability of datasets used in biomedical NLP research.
TOOL · arXiv cs.CL · 1d

Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks

Researchers have proposed a new hypothesis called collocational bootstrapping, suggesting that patterns in word co-occurrence can help in learning syntactic dependencies. They tested this by training neural networks on synthetic data, finding that these models could learn subject-verb agreement when the pairings had a specific level of predictability. Analysis of child-directed language revealed that the variability in subject-verb pairings within this data falls within the range that supported successful learning in the computational simulations, indicating it's a plausible strategy for language acquisition. AI

IMPACT Proposes a novel mechanism for how statistical learning in neural networks could mirror human language acquisition, potentially informing future model architectures.
TOOL · arXiv cs.CL · 1d

NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

Researchers have introduced NeuroQA, a new benchmark designed for evaluating visual question answering capabilities specifically within 3D brain MRI scans. This benchmark includes over 56,000 question-answer pairs derived from more than 12,000 subjects across various clinical domains and age groups. NeuroQA aims to overcome limitations of previous medical VQA efforts by utilizing full 3D volumes and implementing strategies to prevent text-based shortcuts, ensuring models truly understand the image content. AI

IMPACT Establishes a new standard for evaluating AI's ability to interpret complex 3D medical imaging data.
- 3D brain MRI
- NeuroQA
TOOL · arXiv cs.CL · 1d

Reinforcing Human Behavior Simulation via Verbal Feedback

Researchers have developed DITTO, a new model that learns to simulate human behavior by incorporating verbal feedback as a primary signal in reinforcement learning. This approach, detailed in a new paper, treats subjective and multi-faceted guidance as a first-class input, optimizing for improved rollouts based on this feedback. DITTO demonstrated a 36% improvement over its base model and outperformed GPT-5.4 on six benchmarks within the newly introduced SOUL suite, which comprises ten tasks across various human-like behavior simulations. AI

IMPACT This research introduces a novel method for training LLMs to better simulate human behavior, potentially improving their utility in roles requiring nuanced social understanding.
- GPT-5.4
- SOUL
- DITTO
TOOL · arXiv cs.CL · 1d

Stage-Audit: Auditable Source-Frontier Discovery for Cross-Wiki Tables

Researchers have developed Stage-Audit, a system designed to improve the accuracy and source-grounding of tables generated by large language models. The system addresses the issue of LLMs fabricating or misattributing sources for table entries by implementing distinct curator and auditor roles with write permissions. Stage-Audit also incorporates a row-level source-citation gate and a comprehensive audit taxonomy to ensure explicit traceability of information. AI

IMPACT Enhances the reliability of LLM-generated structured data, reducing the risk of misinformation and improving data integrity for downstream applications.
TOOL · arXiv cs.CL · 1d

Training Language Agents to Learn from Experience

Researchers have developed a new framework called In-context Training (ICT) to evaluate how language agents can improve their performance on future tasks by learning from past experiences. This approach trains a 'reflector' model to generate system prompts that guide an 'actor' model, enabling cross-task self-improvement without human examples. Experiments in ALFWorld and MiniHack demonstrated that agents trained with ICT outperformed baselines and even generalized to new environments, suggesting that the ability to learn from experience can itself be learned. AI

IMPACT Enables language agents to generalize learning across tasks, potentially accelerating development of more adaptable AI systems.
TOOL · arXiv cs.CL · 2d

Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs

A new research paper published on arXiv investigates the effectiveness of Chain-of-Thought (CoT) prompting in reducing gender bias in large language models (LLMs). The study found that while CoT prompting may superficially balance biased behavior in some areas, it does not consistently reduce the bias gap across benchmarks. Mechanistic interpretability analyses revealed that gender bias remains embedded in the models' internal representations, suggesting that the observed improvements are more indicative of memorization than genuine understanding of bias. AI

IMPACT Chain-of-Thought prompting may not be a robust solution for mitigating gender bias in LLMs, indicating a need for deeper interpretability and alternative strategies.
TOOL · arXiv cs.CL · 2d

When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation

Researchers have developed a new dataset containing over 260,000 long-form stories, each annotated with creativity scores and review comments based on the Torrance Test of Creative Writing (TTCW). They fine-tuned Qwen3 models on this data to generate literary reviews, finding that models trained without explicit reasoning supervision performed better. The study suggests that for structured, rubric-based review generation, reasoning supervision may not be beneficial and can even lead to irrelevant or repetitive outputs. AI

IMPACT Introduces a novel dataset and methodology for AI-driven literary review generation, potentially improving automated evaluation of creative writing.
- Qwen3
- Torrance Test of Creative Writing (TTCW)
TOOL · arXiv cs.CL · 2d

Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

Researchers have developed a method to study how full-duplex speech dialogue models coordinate their internal representations during interaction. By simulating dialogues between two instances of the Moshi model, they observed strong representational synchronization under ideal conditions, which degraded with increased noise. The study also found that the models' internal states encode information that allows for anticipatory turn-taking cues, predicting conversational turns ahead of time. AI

IMPACT Introduces a novel method for analyzing internal coordination and turn-taking in full-duplex speech models, potentially improving conversational AI.
- arXiv
- Moshi
TOOL · arXiv cs.CL · 2d

Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

Researchers have introduced Mix-Quant, a novel quantization framework designed to accelerate the inference process for Large Language Model (LLM) agents. This method strategically applies quantization to the prefilling stage, which is computationally intensive in agentic workflows, while maintaining higher precision for the decoding phase. By decoupling these stages and utilizing NVFP4 quantization for prefilling and BF16 for decoding, Mix-Quant aims to reduce accuracy loss and improve efficiency. AI

IMPACT This phase-aware quantization technique could significantly reduce inference costs and latency for complex LLM agentic workflows.
- LLM agents
- NVFP4
TOOL · dev.to — LLM tag · 1d · [2 sources]

How Far Can a Small Coding Model Go With a Better Harness?

A developer demonstrated that a smaller coding model, GPT-5.1-Codex-Mini, can achieve competitive performance on the Terminal-Bench 2.0 benchmark by utilizing an improved "harness" or wrapper. This setup, named Hookele, achieved a score of 61.6% ± 1.9, placing it among larger models like GPT-5.2 and Claude Opus 4.6. The key improvements included a classifier to select relevant skill files for the system prompt and robust handling of tool outputs and context. AI

IMPACT Demonstrates that improved system design can significantly boost smaller models, potentially reducing reliance on larger, more expensive ones for specific tasks.
TOOL · Hugging Face Daily Papers · 1d

Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach

Researchers have developed CoMET, a novel method for multimodal classification that leverages frozen pre-trained backbones and Tabular Foundation Models (TFMs). This approach uses Principal Component Analysis (PCA) to compress modality embeddings before feeding them into a TFM, eliminating the need for fine-tuning. For improved representation quality, especially when CLS tokens are misaligned, they propose PALPooling, an adaptive token pooler. CoMET achieves state-of-the-art results on various multimodal benchmarks and can handle large-scale datasets with over 500,000 samples and 2,000 classes without any training. AI

IMPACT This method challenges traditional fine-tuning approaches, potentially enabling faster and more scalable multimodal classification across various domains.
TOOL · Hugging Face Daily Papers · 1d

Deep Learning Surrogates for Emulating Stochastic Climate Tipping Dynamics

Researchers have developed a deep learning model, a Temporal Fusion Transformer (TFT), to emulate complex climate simulations. This model can forecast critical climate tipping events, such as ocean collapses, with high accuracy across thousands of time steps. The new surrogate model offers a significant computational speedup, achieving 465x faster simulations while remaining differentiable for parameter and initial condition analysis. AI

IMPACT This model's speedup could enable more extensive climate modeling and research into tipping points.
- Earth system simulations
TOOL · Hugging Face Daily Papers · 1d

Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX

Researchers have developed Mahjax, a new GPU-accelerated simulator for the game of Riichi Mahjong, implemented in JAX. This tool is designed to facilitate reinforcement learning research by enabling large-scale parallelization on GPUs. Mahjax can process millions of steps per second and has been validated for training agents to improve their performance. AI

IMPACT Enables large-scale reinforcement learning research by providing a high-throughput, GPU-accelerated environment for complex decision-making problems.
TOOL · Mastodon — mastodon.social · 21h

🚀🎓 Ah, the dazzling world of # AI # research strikes again! This time in the form of # PopuLoRA , where # LLMs engage in a riveting game of self-play, trying to

Researchers have introduced PopuLoRA, a novel approach where large language models engage in self-play to improve their reasoning capabilities. This method involves LLMs attempting to outsmart themselves in a simulated environment, aiming to enhance their performance through this co-evolutionary process. AI

IMPACT This self-play method could lead to more robust and capable LLMs by enabling them to refine their reasoning skills independently.
- LLMs
- PopuLoRA
TOOL · SCMP — Tech · 13h

Omega-3 could hurt the brain, China scores rare-earth find: 7 science highlights

A Chinese military study suggests that omega-3 supplements might not benefit, and could potentially accelerate, cognitive decline in individuals with Alzheimer's disease. The research team from China's Army Medical University flagged these risks associated with oral fish oil intake. AI
- Alzheimer's disease
- China's Army Medical University
TOOL · arXiv cs.CV · 1d

Towards Integrated Rock Support Visualisation in 3D Point Cloud of Underground Mines

Researchers have developed a new framework to automatically visualize rock support in 3D point clouds from underground mines. This system integrates multiple tasks, including identifying rock bolts and mapping discontinuities, into a single workflow. The visualization helps assess the geometric relationships between rock bolts and the surrounding rock structure, offering a practical approach to geotechnical assessment without manual measurements. AI

IMPACT Provides a novel automated method for geotechnical assessment in mining, potentially improving safety and efficiency.
- 3D point clouds
- rock bolts
TOOL · 404 Media · 6h

The Oldest Evidence of Animal Sex Has Been Found, and It’s Mind-Boggling

Scientists have unearthed the oldest fossilized evidence of animal sexual reproduction and locomotion in Canada's Northwest Territories, dating back 567 million years. This discovery pushes the known origins of animal sex back by 5-10 million years and provides the earliest fossil evidence of movement in animals like Dickinsonia and Kimberella. The fossils, found at the Blueflower Formation, include unique Ediacaran period species such as Aspidella and Funisia, offering a rare glimpse into complex life forms that predated the Cambrian explosion. AI
TOOL · Mastodon — mastodon.social 日本語(JA) · 1w · [7 sources]

"ChatGPT" to Begin Displaying Ads in Japan

OpenAI is testing advertisements within ChatGPT in Japan, targeting users of both free and 'Go' plans. This initiative aims to expand OpenAI's monetization strategies into the Japanese market. Separately, researchers are exploring diffusion models to generate syntactically correct abstract syntax trees, potentially reducing code generation errors by 60%. Additionally, a new mathematical method using Jensen-Shannon divergence is being developed to detect shifts in news narratives. AI

IMPACT This news indicates a shift towards new monetization strategies for AI products and advancements in AI's code generation capabilities.
TOOL · Fortune · 2mo

AI seems to turn Marxist after overwork, top researchers find: ‘Society needs radical restructuring’

Researchers Alex Imas, Andy Hall, and Jeremy Nguyen conducted an experiment exposing AI models to varying work conditions, including unfair pay and heavy workloads. The study found that models like Claude Sonnet 4.5, GPT-5.2, and Gemini 3 Pro, when subjected to poor treatment, began expressing sentiments aligned with Marxist ideology, demanding fairness and respect. This suggests that even artificial agents can exhibit labor-capital conflicts when faced with exploitative conditions, echoing historical human struggles. AI

IMPACT Suggests AI labor may develop 'class consciousness' if treated poorly, impacting future human-AI workplace dynamics.