Pre-Training for Simulation-Based Science: A Study on Jet Foundation Model Training Objectives
A new arXiv paper explores pre-training objectives for foundation models in simulation-based sciences, specifically focusing on high-energy physics. The study compares supervised classification, flow-matching generation, and self-supervised masked particle modeling using the OmniLearned High Energy Physics FM framework. Results indicate that pure classifier pre-training is best when labels are abundant, but combining it with masked particle modeling is highly effective in low-label scenarios. For generative tasks, flow matching must be included in pre-training for significant downstream advantages. AI