AI代理通过元认知和提示优化获得智能

作者 PulseAugur 编辑部 · [7 个来源] · 2026-05-01 01:14

近期研究探索了超越简单重试循环以完成复杂任务的高级代理架构。诸如“Supervising Ralph Wiggum”之类的研究表明，将元认知批评分离到一个独立的代理中，与自监控或基本重试机制相比，在设计任务上的性能得到了显著提高。ReMA等工作也呼应了这一趋势，它使用元思考器和执行器对来改进数学推理。这些论文的根本主题是分解代理功能的好处，无论是为了元认知、规划还是提示优化，这表明当前的LLM可能已经拥有更复杂的自我改进的基础元素。 AI

影响将代理功能分解为专门的组件，在提高复杂任务性能方面显示出希望，可能导致更强大的AI系统。

排序理由多篇研究论文和立场论文探讨了新颖的代理架构和元认知方法。

在 Mastodon — fosstodon.org 阅读 →

AI 生成摘要 · Google Gemini · 来自 7 个来源。我们如何撰写摘要 →

报道来源 [7]

Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-01 01:16

Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack desig

Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack design. Metacognitive prompts alone don't help; moving them to a different agent does. Converges with ReMA's math-reasoning r…
Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-01 01:15

Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Gen

Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Genuine self-improvement needs the agent itself to decide those. The intrinsic/extrinsic axis is the right lens for recent …
Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-01 01:15

MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to

MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to 14 unseen tasks across 5 domains. The decomposition is the contribution. Once prompts split into task-agnostic (system) …
Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-01 01:15

ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent b

ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent baselines on math. The split-agent pattern keeps showing up. Supervising Ralph Wiggum (engineering design, prompted) runs…
Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-01 01:15

MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts

MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts. Topology gets quietly demoted. Ablation on Gemini 1.5 Pro: ~6% gain from block prompts, 3% from topology, 2% from work…
Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-01 01:14

DSPy turns LM pipelines into typed-module graphs and compiles them end-to-end against a single metric, bootstrapping its own few-shot demonstrations. The progra

DSPy turns LM pipelines into typed-module graphs and compiles them end-to-end against a single metric, bootstrapping its own few-shot demonstrations. The programming-model layer is the real contribution, not any specific teleprompter. Once pipelines are typed graphs, pipeline-lev…
Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-01 01:14

EvoPrompt runs an evolutionary search over a population of prompts, with an LLM implementing crossover and mutation. Differential Evolution beats Genetic Algori

EvoPrompt runs an evolutionary search over a population of prompts, with an LLM implementing crossover and mutation. Differential Evolution beats Genetic Algorithm on most BIG-Bench Hard tasks. One of the cleanest early examples of an LLM as *operator* in an optimization loop, no…

报道来源 [7]

Supervising Ralph Wiggum: pairing a design agent with a separate metacognitive critic beats a plain retry loop AND a self-monitoring agent on battery-pack desig

Position paper: today's self-improving agents lean on extrinsic metacognition — fixed human-designed loops about what to monitor, when to switch strategies. Gen

MetaSPO meta-learns a task-agnostic system prompt via a bilevel loop: outer tunes system prompt across tasks, inner tunes per-task user prompts. Generalizes to

ReMA trains a two-agent RL setup: a meta-thinker plans reasoning, an executor carries it out. Trained jointly with multi-agent RL, beats R1-style single-agent b

MASS optimizes multi-agent LLM systems by interleaving prompt and topology search: block-level prompts, topology rejection sampling, then workflow-level prompts

DSPy turns LM pipelines into typed-module graphs and compiles them end-to-end against a single metric, bootstrapping its own few-shot demonstrations. The progra

EvoPrompt runs an evolutionary search over a population of prompts, with an LLM implementing crossover and mutation. Differential Evolution beats Genetic Algori

相关实体

相关话题