ENTITY Elo

Elo

PulseAugur coverage of Elo — every cluster mentioning Elo across labs, papers, and developer communities, ranked by signal.

Total · 30d

7

7 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

7

7 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 7 TOTAL

TOOL · CL_111648 · Jun 26 · 04:00

New chess rating system uses cognitive model to track skill changes

Researchers have developed a new skill assessment framework for chess called the Drift-Diffusion-Enhanced Elo Rating System (DD-Elo). This system draws inspiration from cognitive neuroscience's drift diffusion model to …
TOOL · CL_104712 · Jun 21 · 07:17

New methods assess physical consistency in AI-generated videos

Researchers have developed new methods to evaluate the physical consistency of videos generated by world models, addressing a gap in current simulation tools. These reference-free measures combine relative and absolute …
TOOL · CL_93296 · Jun 16 · 04:00

New framework uses AI to guide human comparisons for efficient ranking

Researchers have developed a novel human-in-the-loop ranking framework called Surprise-Guided MergeSort (SGS). This system uses a Vision-Language Model (VLM) to identify comparisons that genuinely require human judgment…
RESEARCH · CL_79519 · Jun 8 · 12:26

New research validates pairwise comparisons for AI model accuracy

A new research paper proposes that pairwise comparisons, commonly used to evaluate generative models, align well with accuracy-based rankings. The study converted five benchmarks into generative evaluations and found th…
RESEARCH · CL_91476 · Jun 3 · 00:00

New methods improve LLM evaluation accuracy with AI and human insights

Researchers have developed new methods to improve the accuracy and calibration of Large Language Model (LLM) evaluations. One approach, Conformal Elo Estimation, uses LLM judgments to estimate Elo ratings, achieving res…
RESEARCH · CL_22018 · May 7 · 17:57

Study finds global LLM leaderboards misleading, proposes portfolio rankings

A new research paper argues that current leaderboards for large language models (LLMs) are misleading due to significant heterogeneity in user preferences across languages and tasks. The study analyzed approximately 89,…
TOOL · CL_17792 · Mar 25 · 14:22

Chess-GPT model learns world model, can be manipulated to change skill

Researchers have explored interventions on a language model trained to play chess, dubbed Chess-GPT. By manipulating the model's internal representations of the board state and player skill, they demonstrated a causal l…