Do Time Series Foundation Model Benchmarks Hide Regime-Dependent Failures? Evidence from Traffic Speed Forecasting
A new research paper published on arXiv highlights potential shortcomings in current benchmarks for time series foundation models (TSFMs). The study, focusing on traffic speed forecasting, reveals that aggregate metrics used in standard evaluations can obscure significant performance degradations during critical transition periods between free-flow and congested traffic states. These models exhibit sharply reduced accuracy and prediction interval coverage during these transitions, a failure masked by the dominance of free-flow data in overall metrics. The research proposes a regime-aware evaluation approach and a Bimodal Mixture Augmentation (BMA) method to improve model performance and transparency. AI
IMPACT Highlights the need for more robust evaluation metrics for time series models, potentially impacting future model development and deployment in critical infrastructure.