Cohen's kappa
PulseAugur coverage of Cohen's kappa — every cluster mentioning Cohen's kappa across labs, papers, and developer communities, ranked by signal.
5 day(s) with sentiment data
-
LLM-as-judge tools fail to prioritize human validation, study finds
A recent evaluation of six LLM-as-judge tools revealed that most prioritize generating scores over ensuring the trustworthiness of those scores. The author argues that a judge's validation against human labels, measured…
-
LLMs analyzed for self-stigma support in drug use communities · 2 sources tracked
Researchers have developed methods to analyze self-stigma expressed by individuals who use drugs in online communities, specifically on Reddit. One study created a codebook to categorize self-stigma into cognitive, affe…
-
New framework measures university CS curriculum alignment with global standards
A new framework has been developed to measure how well university computer science programs align with international curricular guidelines, specifically CS2013 and CS2023. This human-in-the-loop pipeline represents prog…
-
LLM-as-a-Judge models show significant reliability and bias issues, study finds
A new study evaluating LLM-as-a-Judge models reveals significant issues with their reliability and validity. The research, which analyzed 21 judges across multiple benchmarks and over 541,000 judgments, found that commo…
-
LLMs show promise in identifying discourse units for aphasia assessment
A new research paper explores the use of instruction-tuned large language models (LLMs) for classifying Correct Information Units (CIUs) in aphasic discourse. The study found that while zero-shot prompting was insuffici…
-
LLM judge evaluations require hundreds of labels for reliable results
A recent article highlights the critical need for larger evaluation datasets when using LLMs as judges in AI model assessments. The author explains that common practice of using small, ad-hoc datasets is insufficient fo…
-
LLM system aids explainable defect analysis in laser powder bed fusion
Researchers have developed a new decision-support system that combines structured knowledge about defects with large language models (LLMs) to analyze and guide mitigation strategies in laser powder bed fusion (LPBF) ma…