Brief

last 24h

[2/2] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · dev.to — LLM tag English(EN) · 2d · [2 sources]

Eval Set Drift: How to Know When Your Golden Set Went Stale

The author discusses two common challenges in managing LLM applications: eval set drift and per-customer cost reporting. For eval set drift, they propose using Maximum Mean Discrepancy (MMD) on embeddings to detect when evaluation datasets no longer represent production data. For cost reporting, they suggest leveraging OpenTelemetry baggage to propagate customer IDs across services, avoiding costly pipeline rearchitectures. AI

IMPACT Provides practical techniques for developers to improve LLM evaluation accuracy and cost management, crucial for operationalizing AI applications.
RESEARCH · arXiv stat.ML English(EN) · 5d · [2 sources]

MMD-Balls as Credal Sets: A PAC-Bayesian Framework for Epistemic Uncertainty in Test-Time Adaptation

Researchers have developed a PAC-Bayesian framework to quantify epistemic uncertainty in test-time adaptation (TTA) methods. This framework uses maximum mean discrepancy (MMD) between source and target distributions to derive generalization bounds. By interpreting MMD-balls as credal sets, the approach separates epistemic from aleatoric uncertainty, offering a principled way to decide when adaptation is beneficial. AI

IMPACT Provides a theoretical foundation for understanding and quantifying uncertainty in models adapting to new data distributions.