PulseAugur
EN
LIVE 04:04:41
ENTITY supervised fine-tuning

supervised fine-tuning

PulseAugur coverage of supervised fine-tuning — every cluster mentioning supervised fine-tuning across labs, papers, and developer communities, ranked by signal.

Show in brief
Total · 30d
71
71 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
66
66 over 90d
TIER MIX · 90D
TOPICS
RELATIONSHIPS
SENTIMENT · 30D

19 day(s) with sentiment data

RECENT · PAGE 1/4 · 71 TOTAL
  1. RESEARCH · CL_111597 ·

    New Intent-Aware Training Boosts LLM Safety Classifiers

    Researchers have developed a new method for improving the safety classification of large language models by explicitly modeling user intent. They introduced AIMS, a dataset of 1,724 safety prompts with associated intent…

  2. RESEARCH · CL_111553 ·

    New AI architecture quantifies judicial discretion in legal outcome prediction

    Researchers have developed a novel Judge-Aware Gated Multi-Task Learning architecture to better predict legal outcomes by distinguishing between factual case evidence and judicial discretion. This approach, evaluated on…

  3. RESEARCH · CL_109570 ·

    New Generalization Spectrum framework evaluates AI learning transfer

    Researchers have introduced the Generalization Spectrum, a novel evaluation framework designed to assess how far a learning algorithm's knowledge can transfer beyond its training data. This approach moves beyond traditi…

  4. TOOL · CL_107973 ·

    New research explores weight-space geometry of AI reasoning distillation methods

    A new research paper analyzes the geometric properties of weight updates across various offline reinforcement learning methods used for distilling reasoning capabilities into smaller AI models. The study trained six dif…

  5. RESEARCH · CL_107733 ·

    New benchmarks push video AI to ground answers in temporal evidence · 4 sources tracked

    Two new research papers introduce benchmarks and models for video question answering that focus on temporal reasoning and evidence grounding. The EG-VQA benchmark, with over 11,000 QA pairs and temporal evidence annotat…

  6. RESEARCH · CL_107914 ·

    PointVG-R model enhances visual grounding with geometric reasoning · 3 sources tracked

    Researchers have developed PointVG-R, a novel reasoning-guided Multi-modal Large Language Model (MLLM) designed to improve precise pointing localization in images. This model integrates geometric-aware reasoning, Reinfo…

  7. RESEARCH · CL_104846 ·

    VibeThinker 3B model surpasses Opus 4.5 in reasoning with novel SFT+GRPO

    A new 3-billion parameter model named VibeThinker has demonstrated superior reasoning capabilities compared to Anthropic's Opus 4.5. This performance was achieved using a novel combination of supervised fine-tuning (SFT…

  8. TOOL · CL_104872 ·

    New BALTO framework precisely targets LLM hallucinations at token level

    Researchers from Shanghai Jiao Tong University and Tencent have developed BALTO, a novel reinforcement learning framework designed to precisely eliminate hallucinations in large language models (LLMs). The framework ope…

  9. TOOL · CL_105150 ·

    LLMs fail to reliably self-report adversarial prefill attacks, study finds

    A new study published on arXiv investigates the ability of large language models (LLMs) to self-report when they have been influenced by adversarial prefill attacks. The research found that across ten different open-wei…

  10. TOOL · CL_106838 ·

    BoxCtrl framework enables precise 3D geometric image editing

    Researchers have introduced BoxCtrl, a novel framework for precise 3D geometric image editing. This method utilizes 3D bounding boxes with distinct RGB colors projected onto 2D images as visual prompts, allowing for acc…

  11. RESEARCH · CL_105088 ·

    Knowledge distillation outperforms SFT in low-data LLM training

    A new paper explores knowledge distillation (KD) for post-training large language models (LLMs), finding it outperforms supervised fine-tuning (SFT) in low-data scenarios. The effectiveness of KD diminishes as more data…

  12. TOOL · CL_106811 ·

    RLVR outperforms SFT for LLM reasoning, paper shows

    A new paper analyzes why reinforcement fine-tuning, specifically Reinforcement Learning with Verifiable Rewards (RLVR), outperforms supervised fine-tuning (SFT) for improving the reasoning capabilities of large language…

  13. RESEARCH · CL_105023 ·

    New AI agents leverage world models and self-repair for enhanced reasoning

    Researchers have introduced Qwen-AgentWorld, a novel language world model designed to simulate agent environments across seven domains. This model is trained through a three-stage pipeline including continual pre-traini…

  14. COMMENTARY · CL_102601 ·

    Guide to Supervised Fine-Tuning Launched

    This article serves as an introductory guide to supervised fine-tuning, marking the beginning of a series focused on this technique. It aims to educate readers on the fundamental concepts and initial steps involved in a…

  15. TOOL · CL_100183 ·

    Persistent homology tracks LLM representation changes during fine-tuning

    Researchers have employed persistent homology to analyze the internal representation dynamics of large language models during supervised fine-tuning. Their study, which examined four transformer models (1B to 7B paramet…

  16. TOOL · CL_105016 ·

    New Agentic Data Tailoring paradigm structures multimodal streams

    Researchers have introduced a new paradigm called Agentic Data Tailoring, which uses learnable data processing to structure high-entropy multimodal streams. The DataClaw_0-9B model, trained using supervised fine-tuning …

  17. TOOL · CL_98029 ·

    New 'Sparsity Curse' hinders merging of advanced RLVR AI models

    A new research paper introduces the "Sparsity Curse" phenomenon, which describes how Reinforcement Learning with Verifiable Reward (RLVR) models, despite their advanced reasoning capabilities, become difficult to merge …

  18. TOOL · CL_98007 ·

    New DRIFT method refines LLM training data for improved performance

    Researchers have developed DRIFT, a novel method for refining instruction data to improve the performance ceiling of large language models. Unlike existing data curation techniques that focus on subset selection, DRIFT …

  19. RESEARCH · CL_99607 ·

    New research explores RL advancements for LLMs and AI agents · 8 sources tracked

    Multiple research papers released on arXiv explore advancements in reinforcement learning (RL) for large language models (LLMs) and other AI agents. One paper introduces RiVER, a framework for training LLMs on score-bas…

  20. RESEARCH · CL_97817 ·

    Study compares LLM adaptation methods for French medical QA

    A new study published on arXiv explores the effectiveness of different methods for adapting large language models (LLMs) to specialized domains and languages, using French medical question-answering as a case study. The…