ENTITY Cohen's kappa

Cohen's kappa

PulseAugur coverage of Cohen's kappa — every cluster mentioning Cohen's kappa across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

7 over 90d

Releases · 30d

0 over 90d

Papers · 30d

6 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

5 day(s) with sentiment data

RECENT · PAGE 1/1 · 7 TOTAL

RESEARCH · CL_106950 · Jun 23 · 17:41

LLM-as-judge tools fail to prioritize human validation, study finds

A recent evaluation of six LLM-as-judge tools revealed that most prioritize generating scores over ensuring the trustworthiness of those scores. The author argues that a judge's validation against human labels, measured…
RESEARCH · CL_105153 · Jun 22 · 14:14

LLMs analyzed for self-stigma support in drug use communities · 2 sources tracked

Researchers have developed methods to analyze self-stigma expressed by individuals who use drugs in online communities, specifically on Reddit. One study created a codebook to categorize self-stigma into cognitive, affe…
TOOL · CL_100060 · Jun 19 · 04:00

New framework measures university CS curriculum alignment with global standards

A new framework has been developed to measure how well university computer science programs align with international curricular guidelines, specifically CS2013 and CS2023. This human-in-the-loop pipeline represents prog…
RESEARCH · CL_99671 · Jun 17 · 19:37

LLM-as-a-Judge models show significant reliability and bias issues, study finds

A new study evaluating LLM-as-a-Judge models reveals significant issues with their reliability and validity. The research, which analyzed 21 judges across multiple benchmarks and over 541,000 judgments, found that commo…
TOOL · CL_93144 · Jun 16 · 04:00

LLMs show promise in identifying discourse units for aphasia assessment

A new research paper explores the use of instruction-tuned large language models (LLMs) for classifying Correct Information Units (CIUs) in aphasic discourse. The study found that while zero-shot prompting was insuffici…
TOOL · CL_52901 · May 26 · 17:49

LLM judge evaluations require hundreds of labels for reliable results

A recent article highlights the critical need for larger evaluation datasets when using LLMs as judges in AI model assessments. The author explains that common practice of using small, ad-hoc datasets is insufficient fo…
TOOL · CL_18536 · May 6 · 04:00

LLM system aids explainable defect analysis in laser powder bed fusion

Researchers have developed a new decision-support system that combines structured knowledge about defects with large language models (LLMs) to analyze and guide mitigation strategies in laser powder bed fusion (LPBF) ma…

LLM-as-judge tools fail to prioritize human validation, study finds

LLMs analyzed for self-stigma support in drug use communities · 2 sources tracked

New framework measures university CS curriculum alignment with global standards

LLM-as-a-Judge models show significant reliability and bias issues, study finds

LLMs show promise in identifying discourse units for aphasia assessment

LLM judge evaluations require hundreds of labels for reliable results

LLM system aids explainable defect analysis in laser powder bed fusion