新基准推动表格机器学习在不平衡、字符串和多模态数据方面的发展

作者 PulseAugur 编辑部 · [4 个来源] · 2026-05-11 14:12

研究人员推出了新的基准来推动表格机器学习。TILBench 解决了跨越不同数据特征的不平衡学习问题，并揭示没有一种单一方法是普遍优越的。STRABLE 解决了表格数据中包含字符串这一研究不足的领域，发现简单的字符串嵌入与先进的表格学习器配对在类别主导的表格上表现良好。MulTaBench 专注于多模态表格学习，评估表格信息之外的文本和图像数据，并强调了针对特定任务调整嵌入的好处。 AI

影响为表格数据建立了新的评估框架，推动了不平衡学习、字符串处理和多模态集成方面的研究。

排序理由多篇研究论文为表格机器学习任务引入了新的基准。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

arXiv cs.LG TIER_1 English(EN) · Jiaqi Luo · 2026-05-14 14:50

TILBench: A Systematic Benchmark for Tabular Imbalanced Learning Across Data Regimes

Imbalanced learning remains a fundamental challenge in tabular data applications. Despite decades of research and numerous proposed algorithms, a systematic empirical understanding of how different imbalanced learning methods behave across diverse data characteristics is still la…
arXiv cs.LG TIER_1 English(EN) · Gaël Varoquaux · 2026-05-12 15:47

STRABLE: Benchmarking Tabular Machine Learning with Strings

Benchmarking tabular learning has revealed the benefit of dedicated architectures, pushing the state of the art. But real-world tables often contain string entries, beyond numbers, and these settings have been understudied due to a lack of a solid benchmarking suite. They lead to…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-11 14:12

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

Tabular Foundation Models have recently established the state of the art in supervised tabular learning, by leveraging pretraining to learn generalizable representations of numerical and categorical structured data. However, they lack native support for unstructured modalities su…
arXiv cs.CV TIER_1 English(EN) · Roi Reichart · 2026-05-11 14:12

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

Tabular Foundation Models have recently established the state of the art in supervised tabular learning, by leveraging pretraining to learn generalizable representations of numerical and categorical structured data. However, they lack native support for unstructured modalities su…

报道来源 [4]

TILBench: A Systematic Benchmark for Tabular Imbalanced Learning Across Data Regimes

STRABLE: Benchmarking Tabular Machine Learning with Strings

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

相关实体

相关话题