Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks

By PulseAugur Editorial · [7 sources] · 2026-04-23 14:05

Researchers have developed two new frameworks for improving tabular data processing. One, called "Improving Robustness of Tabular Retrieval via Representational Stability," addresses the issue of serialization sensitivity in transformer-based table retrieval systems by averaging embeddings from different formats to create a canonical representation. The other, SAGE (Sparse Adaptive Guidance), is an LLM-based framework for generating synthetic tabular data that enforces sparse and dynamic dependency guidance, improving data fidelity and downstream utility. Additionally, a benchmark called TEmBed has been introduced to systematically evaluate tabular embeddings across various tasks and representation levels, offering practical guidance for selecting appropriate models. AI

IMPACT New methods for tabular data retrieval and generation offer improved fidelity and utility for downstream tasks.

RANK_REASON Multiple academic papers released on arXiv detailing new methods and benchmarks for tabular data processing.

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 7 sources. How we write summaries →

COVERAGE [7]

arXiv cs.LG TIER_1 English(EN) · Shuo Yang, Zheyu Zhang, Bardh Prenkaj, Gjergji Kasneci · 2026-04-28 04:00

SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation

arXiv:2604.24368v1 Announce Type: new Abstract: Generating high-fidelity synthetic tabular data remains a critical challenge for enhancing data availability in privacy-sensitive and low-resource domains. Recent approaches leverage LLMs by representing table rows as sequences, yet…
arXiv cs.CL TIER_1 English(EN) · Kushal Raj Bhandari, Adarsh Singh, Jianxi Gao, Soham Dan, Vivek Gupta · 2026-04-28 04:00

Improving Robustness of Tabular Retrieval via Representational Stability

arXiv:2604.24040v1 Announce Type: new Abstract: Transformer-based table retrieval systems flatten structured tables into token sequences, making retrieval sensitive to the choice of serialization even when table semantics remain unchanged. We show that semantically equivalent ser…
arXiv cs.LG TIER_1 English(EN) · Gjergji Kasneci · 2026-04-27 12:03

SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation

Generating high-fidelity synthetic tabular data remains a critical challenge for enhancing data availability in privacy-sensitive and low-resource domains. Recent approaches leverage LLMs by representing table rows as sequences, yet suffer from two fundamental limitations: (1) th…
arXiv cs.CL TIER_1 English(EN) · Vivek Gupta · 2026-04-27 04:52

Improving Robustness of Tabular Retrieval via Representational Stability

Transformer-based table retrieval systems flatten structured tables into token sequences, making retrieval sensitive to the choice of serialization even when table semantics remain unchanged. We show that semantically equivalent serializations, such as $\texttt{csv}$, $\texttt{ts…
arXiv cs.LG TIER_1 English(EN) · Sven Jacob, Bardh Prenkaj, Weijia Shao, Gjergji Kasneci · 2026-04-27 04:00

TabSCM: A practical Framework for Generating Realistic Tabular Data

arXiv:2604.22337v1 Announce Type: new Abstract: Most tabular-data generators match marginal statistics yet ignore causal structure, leading downstream models to learn spurious or unfair patterns. We present TabSCM, a mixed-type generator that preserves those causal dependencies. …
arXiv cs.LG TIER_1 English(EN) · Gjergji Kasneci · 2026-04-24 08:10

TabSCM: A practical Framework for Generating Realistic Tabular Data

Most tabular-data generators match marginal statistics yet ignore causal structure, leading downstream models to learn spurious or unfair patterns. We present TabSCM, a mixed-type generator that preserves those causal dependencies. Starting from a Completed Partially Directed Acy…
arXiv cs.LG TIER_1 English(EN) · Horst Samulowitz · 2026-04-23 14:05

Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks

Tabular foundation models aim to learn universal representations of tabular data that transfer across tasks and domains, enabling applications such as table retrieval, semantic search and table-based prediction. Despite the growing number of such models, it remains unclear which …

COVERAGE [7]

SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation

Improving Robustness of Tabular Retrieval via Representational Stability

SAGE: Sparse Adaptive Guidance for Dependency-Aware Tabular Data Generation

Improving Robustness of Tabular Retrieval via Representational Stability

TabSCM: A practical Framework for Generating Realistic Tabular Data

TabSCM: A practical Framework for Generating Realistic Tabular Data

Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks

RELATED ENTITIES

RELATED TOPICS