PulseAugur
LIVE 10:55:47
research · [7 sources] ·
0
research

Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks

Researchers have developed two new frameworks for improving tabular data processing. One, called "Improving Robustness of Tabular Retrieval via Representational Stability," addresses the issue of serialization sensitivity in transformer-based table retrieval systems by averaging embeddings from different formats to create a canonical representation. The other, SAGE (Sparse Adaptive Guidance), is an LLM-based framework for generating synthetic tabular data that enforces sparse and dynamic dependency guidance, improving data fidelity and downstream utility. Additionally, a benchmark called TEmBed has been introduced to systematically evaluate tabular embeddings across various tasks and representation levels, offering practical guidance for selecting appropriate models. AI

Summary written by None from 7 sources. How we write summaries →

IMPACT New methods for tabular data retrieval and generation offer improved fidelity and utility for downstream tasks.

RANK_REASON Multiple academic papers released on arXiv detailing new methods and benchmarks for tabular data processing.

Read on arXiv cs.LG →

Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks

COVERAGE [7]

  1. arXiv cs.LG TIER_1 · Shuo Yang, Zheyu Zhang, Bardh Prenkaj, Gjergji Kasneci ·

    SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation

    arXiv:2604.24368v1 Announce Type: new Abstract: Generating high-fidelity synthetic tabular data remains a critical challenge for enhancing data availability in privacy-sensitive and low-resource domains. Recent approaches leverage LLMs by representing table rows as sequences, yet…

  2. arXiv cs.CL TIER_1 · Kushal Raj Bhandari, Adarsh Singh, Jianxi Gao, Soham Dan, Vivek Gupta ·

    Improving Robustness of Tabular Retrieval via Representational Stability

    arXiv:2604.24040v1 Announce Type: new Abstract: Transformer-based table retrieval systems flatten structured tables into token sequences, making retrieval sensitive to the choice of serialization even when table semantics remain unchanged. We show that semantically equivalent ser…

  3. arXiv cs.LG TIER_1 · Gjergji Kasneci ·

    SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation

    Generating high-fidelity synthetic tabular data remains a critical challenge for enhancing data availability in privacy-sensitive and low-resource domains. Recent approaches leverage LLMs by representing table rows as sequences, yet suffer from two fundamental limitations: (1) th…

  4. arXiv cs.CL TIER_1 · Vivek Gupta ·

    Improving Robustness of Tabular Retrieval via Representational Stability

    Transformer-based table retrieval systems flatten structured tables into token sequences, making retrieval sensitive to the choice of serialization even when table semantics remain unchanged. We show that semantically equivalent serializations, such as $\texttt{csv}$, $\texttt{ts…

  5. arXiv cs.LG TIER_1 · Sven Jacob, Bardh Prenkaj, Weijia Shao, Gjergji Kasneci ·

    TabSCM: A practical Framework for Generating Realistic Tabular Data

    arXiv:2604.22337v1 Announce Type: new Abstract: Most tabular-data generators match marginal statistics yet ignore causal structure, leading downstream models to learn spurious or unfair patterns. We present TabSCM, a mixed-type generator that preserves those causal dependencies. …

  6. arXiv cs.LG TIER_1 · Gjergji Kasneci ·

    TabSCM: A practical Framework for Generating Realistic Tabular Data

    Most tabular-data generators match marginal statistics yet ignore causal structure, leading downstream models to learn spurious or unfair patterns. We present TabSCM, a mixed-type generator that preserves those causal dependencies. Starting from a Completed Partially Directed Acy…

  7. arXiv cs.LG TIER_1 · Horst Samulowitz ·

    Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks

    Tabular foundation models aim to learn universal representations of tabular data that transfer across tasks and domains, enabling applications such as table retrieval, semantic search and table-based prediction. Despite the growing number of such models, it remains unclear which …