PulseAugur
实时 11:08:59

ORiGAMi model synthesizes semi-structured JSON data without flattening

Researchers have developed ORiGAMi, a novel autoregressive transformer architecture designed to synthesize sparse and semi-structured JSON data without the need for flattening. This approach preserves the inherent structure of JSON records, unlike traditional methods that convert them into wide, sparse tables. ORiGAMi serializes JSON into key, value, and structural tokens, encoding their positions within the document tree and enforcing grammar and schema constraints. Evaluations across six datasets demonstrated that ORiGAMi outperformed existing baselines in 17 out of 18 comparisons for fidelity, detection, and utility metrics, while also maintaining high privacy scores. AI

影响 Introduces a new method for generating realistic synthetic data from complex JSON structures, potentially improving privacy and testing for AI systems.

排序理由 This is a research paper introducing a new model architecture for synthetic data generation.

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

ORiGAMi model synthesizes semi-structured JSON data without flattening

报道来源 [1]

  1. arXiv cs.LG TIER_1 English(EN) · Thomas R\"uckstie{\ss}, Robin Vujanic ·

    Autoregressive Synthesis of Sparse and Semi-Structured Mixed-Type Data

    arXiv:2603.01444v2 Announce Type: replace Abstract: Synthetic data generation is an important capability for privacy-preserving data sharing, system benchmarking and test data provisioning. For mixed-type data, existing synthesizers largely target dense, fixed-schema tables, but …