New STAGE pipeline boosts text-to-JSON accuracy with LLM-generated data

By PulseAugur Editorial · [1 sources] · 2026-06-18 10:47

Researchers have developed STAGE, a novel pipeline for generating training data for text-to-JSON conversion. This method uses large language models to synthesize reports and JSON schemas, with ground-truth values validated against underlying spreadsheets. STAGE-Eval, a new benchmark dataset, demonstrates STAGE's effectiveness, significantly improving the performance of the Qwen3-4B model on exact match and value accuracy tasks. AI

IMPACT Enhances structured data extraction capabilities, potentially improving efficiency in industries reliant on document analysis.

RANK_REASON The cluster contains a research paper detailing a new method and benchmark for text-to-JSON learning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New STAGE pipeline boosts text-to-JSON accuracy with LLM-generated data

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Youngjae Yu · 2026-06-18 10:47

Source-Grounded Data Generation for Text-to-JSON Learning

From financial filings to clinical records, legacy industries rely heavily on long, unstructured documents to store high-value information. Reliably extracting this information into structured, machine-readable representations is a key prerequisite to making the contents accessib…

COVERAGE [1]

Source-Grounded Data Generation for Text-to-JSON Learning

RELATED ENTITIES

RELATED TOPICS