ENTITY
PRBench
PRBench
PulseAugur coverage of PRBench — every cluster mentioning PRBench across labs, papers, and developer communities, ranked by signal.
Total · 30d
2
2 over 90d
Releases · 30d
0
0 over 90d
Papers · 30d
2
2 over 90d
TIER MIX · 90D
TOPICS
RECENT · PAGE 1/1 · 2 TOTAL
-
New benchmarks and frameworks advance AI model robustness evaluation
Researchers have introduced PRBench, a new benchmark designed to standardize the evaluation of probabilistic robustness in deep learning models. This benchmark compares various adversarial training (AT) and probabilisti…
-
LLMs struggle to reproduce physics experiment results, failing numerical simulations
A new preprint from Peking University evaluated the ability of large language models to reproduce numerical results from experimental physics papers. Researchers found that all tested LLMs, including OpenAI Codex powere…