Brief · PulseAugur

RESEARCH · Hugging Face Daily Papers English(EN) · 12mo · [414 sources]

Rule2DRC: Benchmarking LLM Agents for DRC Script Synthesis with Execution-Guided Test Generation

Researchers are developing new methods to improve the evaluation and training of large language models (LLMs). One approach, SCOPE, calibrates LLM judges to ensure reliable pairwise evaluations with controlled error rates. Another technique, D3, uses dynamic influence graphs to optimize data scheduling during LLM training by considering sample interactions. Additionally, OBCache offers a principled framework for pruning key-value caches to reduce memory overhead during long-context inference, improving accuracy. AI

IMPACT New research introduces methods for more reliable LLM evaluation, efficient training data scheduling, and optimized inference, potentially improving LLM performance and resource utilization.

LLMs
PagedAttention
FlashAttention
Nested WAIT
A100 GPU
Llama-2-7B
LLM
Asteria
FasterTransformer
Orca
vLLM
SCICONVBENCH
Sarathi-Serve
A100
KVDrive
LLMEval-Logic
LLaDA2.0-mini
DeepSeek-R1-Distill-7B
POPE benchmark
LLaDA2.0-flash
V* benchmark
TIDE
FT-Dojo
Frontier
arXiv
llama.cpp
WebGPU
PALS
Charon
LlamaWeb
FT-Agent
rePIRL
LLaMA
LoRA
Qwen
GPT-5
OBCache
FEM-Bench
Gemini 3 Pro
AxBench
Item Response Theory
SCOPE
Hermes
Lean