Brief · PulseAugur

TOOL · arXiv cs.CL English(EN) · 20h

Source-Grounded Data Generation for Text-to-JSON Learning

Researchers have developed STAGE, a novel pipeline for generating training data for text-to-JSON conversion. This method uses large language models to synthesize reports and JSON schemas, with ground-truth values validated against underlying spreadsheets. STAGE-Eval, a new benchmark dataset, demonstrates STAGE's effectiveness, significantly improving the performance of the Qwen3-4B model on exact match and value accuracy tasks. AI

IMPACT Enhances structured data extraction capabilities, potentially improving efficiency in industries reliant on document analysis.

Hugging Face
arXiv
DagsHub
Qwen3-4B
alphaXiv
ScienceCast
CatalyzeX
Gotit.pub
STAGE-Eval