PulseAugur
EN
LIVE 13:01:22

New RASER system cuts QA token costs by optimizing retrieval

Researchers have developed RASER, a new system designed to optimize multi-hop question-answering by reducing the number of expensive retrieval calls. RASER selectively escalates to more complex retrieval methods only when necessary, based on six features derived from an initial one-shot RAG. This approach significantly cuts down token costs, using 41-49% fewer tokens than always-pruning methods while maintaining competitive accuracy across various LLMs and benchmarks. AI

IMPACT Reduces computational costs for complex question-answering tasks, making LLM applications more efficient.

RANK_REASON The cluster contains a research paper detailing a new method for question answering.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Yuyang Li, Zihe Yan, Tobias K\"afer ·

    RASER: Recoverability-Aware Selective Escalation Router for Multi-Hop Question Answering

    arXiv:2606.02488v1 Announce Type: new Abstract: Multi-hop question-answering systems often use expensive retrieval on every question. They may decompose the question, run several retrieval rounds, or search through bridge entities before answering. All of these strategies rely on…

  2. arXiv cs.AI TIER_1 English(EN) · Tobias Käfer ·

    RASER: Recoverability-Aware Selective Escalation Router for Multi-Hop Question Answering

    Multi-hop question-answering systems often use expensive retrieval on every question. They may decompose the question, run several retrieval rounds, or search through bridge entities before answering. All of these strategies rely on repeated LLM calls to rewrite or decompose the …