miniF2F
PulseAugur coverage of miniF2F — every cluster mentioning miniF2F across labs, papers, and developer communities, ranked by signal.
7 day(s) with sentiment data
-
New benchmark MINIF2F-DAFNY tests LLMs for mathematical theorem proving
Researchers have developed MINIF2F-DAFNY, a new benchmark for evaluating Large Language Models (LLMs) in mathematical theorem proving. This system translates the miniF2F benchmark to Dafny, an auto-active verifier, enab…
-
NVIDIA Nemotron 3 Nano: Open Model for Efficient AI Agents
NVIDIA has released Nemotron 3 Nano, a 30-billion parameter open model designed for efficient reasoning and long-context applications. This model utilizes a hybrid Mixture-of-Experts architecture, activating only a frac…
-
NVIDIA unveils efficient Nemotron 3 LLM family with hybrid architecture
NVIDIA has released two new large language models, Nemotron 3 Nano and Nemotron 3 Ultra, focusing on efficiency and advanced capabilities. Nemotron 3 Nano is a 30B-class model designed for private inference and agentic …
-
Lean Proof Assistant Enhances Reinforcement Learning for Theorem Proving
Researchers have developed a novel method for theorem proving using reinforcement learning, integrating the Lean proof assistant to provide detailed, verified feedback. This approach, termed Process-Verified Reinforceme…
-
New study tests AI proof formalization models for robustness
A new study on arXiv evaluates the robustness of proof autoformalization models, which translate natural language mathematical proofs into formal languages like Lean 4. Researchers introduced global and local perturbati…
-
LLMs evaluated for formal math proofs in Lean 4
A new research paper evaluates the performance of various Large Language Models (LLMs) in generating formal mathematical proofs using the Lean 4 theorem prover. The study employed pass@k and refine@k metrics on subsets …
-
LLM autoformalization struggles with paraphrased inputs
Researchers have investigated the robustness of large language models (LLMs) in autoformalization tasks, specifically their ability to generate formal proofs from natural language statements. The study found that LLMs e…
-
New AI method achieves 100% formal validity in theorem autoformalization
Researchers have developed a novel reference-free iterative refinement process for autoformalizing entire mathematical theorems. This method utilizes feedback from theorem provers and LLM-based judges to enhance formal …
-
Lean 4 autoformalization sensitive to surface phrasing, not semantics
Researchers have investigated the impact of natural language variations on Lean 4 autoformalization, finding that semantically equivalent paraphrases can lead to different formal outputs. Their study, using GPT-family m…