English(EN) Mix, Don't Pick: Why Synthetic Corpus Composition Matters for Time Series Foundation Model Pretraining

研究：合成数据混合提升时间序列模型预训练效果

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-10 04:00

一篇新的研究论文探讨了合成数据构成在时间序列基础模型预训练中的关键作用。研究发现，合成数据生成器的选择可能导致预测误差出现两倍的差异，并且这些生成器的排名在不同的模型架构之间并不一致。研究人员提出，将多种生成器与真实数据混合可以创建最强的预训练语料库，将问题视为语料库构成问题而非生成器选择问题。 AI

影响强调了合成数据构成对时间序列模型的重要性，可能提高预测准确性和模型开发。

排序理由该集群包含一篇详细介绍新研究发现的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Aaryan Nagpal, Debdeep Sanyal, Murari Mandal, Dhruv Kumar, Saurabh Deshpande · 2026-06-10 04:00

Mix, Don't Pick: Why Synthetic Corpus Composition Matters for Time Series Foundation Model Pretraining

arXiv:2606.09912v1 Announce Type: cross Abstract: Choosing the wrong synthetic generator for time-series foundation model pretraining is costly: under identical training budgets, the best and worst generators produce up to a $2\times$ gap in forecasting error, yet the field has n…

报道来源 [1]

Mix, Don't Pick: Why Synthetic Corpus Composition Matters for Time Series Foundation Model Pretraining

相关话题