Researchers propose per-sample clipping for robust and fast AI model training

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 3 sources

Researchers have developed a new training method called per-sample clipped SGD (PS-Clip-SGD) that improves robustness and speed for non-convex optimization problems. This method offers theoretical guarantees for convergence even with heavy-tailed gradient noise. Empirical tests showed PS-Clip-SGD outperformed standard techniques when training AlexNet on CIFAR-100, and it also demonstrated benefits when used with gradient accumulation. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT Introduces a novel training technique that could lead to more efficient and stable model development.

RANK_REASON Academic paper detailing a new optimization method for machine learning.

Read on arXiv stat.ML →

paper
other

COVERAGE [3]

Hugging Face Daily Papers TIER_1 · 2026-05-04 15:11

Robust and Fast Training via Per-Sample Clipping

We propose a robust gradient estimator based on per-sample gradient clipping and analyze its properties both theoretically and empirically. We show that the resulting method, per-sample clipped SGD (PS-Clip-SGD), achieves optimal in-expectation convergence rates for non-convex op…
arXiv stat.ML TIER_1 · Davide Nobile, Philipp Grohs · 2026-05-05 04:00

Robust and Fast Training via Per-Sample Clipping

arXiv:2605.02701v1 Announce Type: cross Abstract: We propose a robust gradient estimator based on per-sample gradient clipping and analyze its properties both theoretically and empirically. We show that the resulting method, per-sample clipped SGD (PS-Clip-SGD), achieves optimal …
arXiv stat.ML TIER_1 · Philipp Grohs · 2026-05-04 15:11

Robust and Fast Training via Per-Sample Clipping

We propose a robust gradient estimator based on per-sample gradient clipping and analyze its properties both theoretically and empirically. We show that the resulting method, per-sample clipped SGD (PS-Clip-SGD), achieves optimal in-expectation convergence rates for non-convex op…

COVERAGE [3]

Robust and Fast Training via Per-Sample Clipping

Robust and Fast Training via Per-Sample Clipping

Robust and Fast Training via Per-Sample Clipping

RELATED ENTITIES

RELATED TOPICS