PulseAugur
EN
LIVE 07:45:37

User seeks LLM fine-tuning methods for open-ended math problems

A user on Reddit's r/MachineLearning subreddit is seeking advice on how to fine-tune a large language model (LLM) for open-ended mathematical problems, specifically proof-based tasks. The user notes that standard reinforcement learning from human feedback (RLHF) methods, which rely on final answers as rewards, are insufficient for this type of problem. They are considering using the MathNet dataset for training data and are looking for alternative fine-tuning techniques beyond supervised fine-tuning (SFT) and standard RL algorithms like GRPO/PPO due to the lack of a clear reward function. AI

IMPACT Discusses challenges in adapting LLMs for complex reasoning tasks, highlighting the need for new fine-tuning approaches beyond standard RLHF.

RANK_REASON User-generated question seeking technical advice on LLM fine-tuning, not a formal release or research paper.

Read on r/MachineLearning →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. r/MachineLearning TIER_1 English(EN) · /u/TechNerd10191 ·

    How to fine-tune an LLM for open-ended problems? [P]

    <!-- SC_OFF --><div class="md"><p>I want to develop an LLM that can solve open-ended math problems (such as proof-only problems). This means that RLVR where we use the final answer alone as reward signal is not enough. Since SFT is useless here and GRPO/PPO methods will not have …