Estonian subjectivity dataset created, LLM scoring tested

By PulseAugur Editorial · [1 sources] · 2026-06-08 04:00

Researchers have developed a new Estonian-language dataset for document-level subjectivity analysis, comprising 1,000 texts rated on a scale from 0 to 100. Initial experiments using this dataset showed moderate inter-annotator agreement among human raters, prompting a re-annotation of divergent scores. An experiment using GPT-5 for automatic subjectivity scoring indicated feasibility but highlighted differences from human annotations, suggesting LLM-based scoring is not a direct substitute for human judgment. AI

IMPACT Provides a new resource for evaluating LLM understanding of subjective content in Estonian.

RANK_REASON The cluster contains an academic paper detailing the creation of a new dataset and an initial experiment. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Estonian subjectivity dataset created, LLM scoring tested

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Karl Gustav Gailit, Kadri Muischnek, Kairit Sirts · 2026-06-08 04:00

Creation of the Estonian Subjectivity Dataset: Assessing the Degree of Subjectivity on a Scale

arXiv:2512.09634v2 Announce Type: replace Abstract: This article presents the creation of an Estonian-language dataset for document-level subjectivity, analyzes the resulting annotations, and reports an initial experiment of automatic subjectivity analysis using a large language …

COVERAGE [1]

Creation of the Estonian Subjectivity Dataset: Assessing the Degree of Subjectivity on a Scale

RELATED ENTITIES

RELATED TOPICS