Researchers distill large foundation models into CPU-ready gradient-boosted trees

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed a method to distill large tabular foundation models (TFMs) into smaller, faster gradient-boosted tree models that can run on CPUs. This process significantly reduces inference time from minutes on GPUs to milliseconds on CPUs, making them suitable for real-time applications like fraud scoring. The distilled models achieve performance close to their TFM counterparts, outperforming other CPU-based baselines on many datasets. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Enables real-time deployment of powerful tabular models on resource-constrained devices, significantly speeding up inference for critical applications.

RANK_REASON Academic paper detailing a new method for model distillation. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

COVERAGE [1]

arXiv cs.AI TIER_1 · Pratinav Seth · 2026-05-18 17:00

Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees

A fraud scorer needs to answer in under 2 ms. The best tabular foundation models (TFMs) take 151-1,275 ms on GPU. We close this gap by distilling the TFM offline into an XGBoost or CatBoost student that runs natively on CPU. The central obstacle is specific to in-context learning…

COVERAGE [1]

Pocket Foundation Models: Distilling TFMs into CPU-Ready Gradient-Boosted Trees

RELATED ENTITIES

RELATED TOPICS