Brief

last 24h

[2/2] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.CL English(EN) · 5d · [2 sources]

Multi-Stage Training for Abusive Comment Detection in Indic Languages

Researchers have developed a multi-stage training pipeline for detecting abusive comments in Indic languages. The proposed system utilizes language-based preprocessing and an ensemble of models to identify harmful content on social media. A key focus of the research is minimizing false positives to ensure freedom of expression is not compromised while enhancing online safety. AI

IMPACT Introduces a novel approach to content moderation for underrepresented languages, potentially improving online safety and inclusivity.
RESEARCH · arXiv cs.AI English(EN) · 6d · [2 sources]

SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR

Researchers have introduced SCRIBE, a new diagnostic framework designed to improve automatic speech recognition (ASR) for Indic languages. Unlike traditional metrics like Word Error Rate (WER), SCRIBE categorizes errors into lexical, punctuation, numeral, and domain-entity types, offering a more nuanced evaluation. The framework also incorporates sandhi-tolerant alignment and domain vocabulary injection to better handle agglutinative languages. Alongside SCRIBE, the team has released LLM curation pipelines, benchmarks, and open-weight rich transcription models for Hindi, Malayalam, and Kannada. AI

IMPACT Enhances ASR accuracy for under-resourced Indic languages, potentially improving accessibility and usability.

Brief

Multi-Stage Training for Abusive Comment Detection in Indic Languages

SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR