PulseAugur
LIVE 06:26:44
research · [1 source] ·
0
research

ORiGAMi model synthesizes semi-structured JSON data without flattening

Researchers have developed ORiGAMi, a novel autoregressive transformer architecture designed to synthesize sparse and semi-structured JSON data without the need for flattening. This approach preserves the inherent structure of JSON records, unlike traditional methods that convert them into wide, sparse tables. ORiGAMi serializes JSON into key, value, and structural tokens, encoding their positions within the document tree and enforcing grammar and schema constraints. Evaluations across six datasets demonstrated that ORiGAMi outperformed existing baselines in 17 out of 18 comparisons for fidelity, detection, and utility metrics, while also maintaining high privacy scores. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a new method for generating realistic synthetic data from complex JSON structures, potentially improving privacy and testing for AI systems.

RANK_REASON This is a research paper introducing a new model architecture for synthetic data generation.

Read on arXiv cs.LG →

COVERAGE [1]

  1. arXiv cs.LG TIER_1 · Thomas R\"uckstie{\ss}, Robin Vujanic ·

    Autoregressive Synthesis of Sparse and Semi-Structured Mixed-Type Data

    arXiv:2603.01444v2 Announce Type: replace Abstract: Synthetic data generation is an important capability for privacy-preserving data sharing, system benchmarking and test data provisioning. For mixed-type data, existing synthesizers largely target dense, fixed-schema tables, but …