Brief

last 24h

[2/2] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · Mastodon — sigmoid.social English(EN) · 3h

DeepSeek V4 is powering a new framework called Goedel-Architect that achieves a 75.6% pass rate on the PutnamBench mathematics competition at just 294 USD - com

A new framework named Goedel-Architect, powered by DeepSeek V4, has achieved a 75.6% pass rate on the PutnamBench mathematics competition. This framework offers a significant cost advantage, costing only $294 compared to $170,000 for similar systems. Researchers attribute the performance gains to architectural innovations rather than superior hardware. AI

IMPACT Demonstrates significant cost-performance improvements in AI for complex mathematical reasoning.
RESEARCH · arXiv cs.AI English(EN) · 5d · [6 sources]

Proof-Refactor: Refactoring Generated Formal Proofs into Modular Artifacts

Researchers have developed new frameworks to enhance formal theorem proving capabilities using large language models. Goedel-Architect utilizes a blueprint generation and refinement strategy, achieving state-of-the-art performance on benchmarks like MiniF2F-test and PutnamBench with the DeepSeek-V4-Flash model. Proof-Refactor focuses on improving the modularity, readability, and maintainability of LLM-generated proofs, outperforming existing baselines on the PutnamBench dataset. Another approach, Compile to Compress, leverages compiler outputs to refine proof attempts efficiently, achieving top results on PutnamBench with smaller models. AI

IMPACT These advancements in AI-driven formal theorem proving could accelerate mathematical discovery and software verification.

Brief

DeepSeek V4 is powering a new framework called Goedel-Architect that achieves a 75.6% pass rate on the PutnamBench mathematics competition at just 294 USD - com

Proof-Refactor: Refactoring Generated Formal Proofs into Modular Artifacts