ENTITY English

English

PulseAugur coverage of English — every cluster mentioning English across labs, papers, and developer communities, ranked by signal.

Total · 30d

103

103 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

89

89 over 90d

TIER MIX · 90D

significant 1
research 49
tool 47
commentary 5
meme 1

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

18 day(s) with sentiment data

RECENT · PAGE 1/6 · 103 TOTAL

TOOL · CL_111729 · Jun 26 · 04:00

New neural diarization model excels on low-resource Nepali-Hindi speech

Researchers have developed a new approach to speaker diarization, the process of identifying who spoke when in an audio recording, specifically for low-resource languages like Nepali-Hindi. They trained two neural netwo…
RESEARCH · CL_111601 · Jun 25 · 12:37

New framework induces hierarchies from diverse text sources

Researchers have developed a new term-centric framework for creating interpretable hierarchical taxonomies from diverse text sources. This method uses automatic term extraction to map documents into a shared representat…
TOOL · CL_109892 · Jun 25 · 04:00

LLMs match and exceed human examiner agreement on UK GCSE exams

A new dataset of 32,534 double-marked real student responses to UK GCSE mock exams has been introduced, covering 328 questions across five subjects, including handwritten work. Researchers found that current large langu…
RESEARCH · CL_111610 · Jun 25 · 00:01

New SOLAR method enhances cross-lingual reasoning in LLMs

Researchers have developed SOLAR, a new method to improve cross-lingual reasoning in large language models. This technique aligns soft-token representations across languages, using English as a pivot to create more lang…
RESEARCH · CL_109559 · Jun 24 · 17:15

Readers still prefer human translations over AI-generated literary texts

A new study published on arXiv reveals that while AI-generated translations of literary texts are considered "fine" by readers, human translations are still preferred for their immersive quality and clarity. The researc…
RESEARCH · CL_109518 · Jun 24 · 15:09

HIPE-2026 evaluates person-place relation extraction from historical texts · 3 sources tracked

The HIPE-2026 evaluation campaign focused on extracting person-place relationships from multilingual historical texts, building upon previous HIPE editions that concentrated on named entity recognition. This year's chal…
RESEARCH · CL_109547 · Jun 24 · 07:00

New Red Teaming Framework Exposes LLM Faithfulness Vulnerabilities

Researchers have developed a novel red teaming framework to systematically uncover vulnerabilities in large language models (LLMs). This framework utilizes a multi-role architecture with target, attacker, and jury model…
RESEARCH · CL_109568 · Jun 24 · 06:42

New neural architecture advances phoneme alignment beyond traditional methods

Researchers have developed a novel, fully differentiable neural architecture for phoneme alignment, aiming to advance the field beyond traditional HMM-GMM frameworks. This end-to-end system features an encoder for signa…
TOOL · CL_108067 · Jun 24 · 04:00

Study finds function vectors in LLMs are largely language-agnostic for translation

Researchers have investigated whether function vectors (FVs), which represent tasks extracted from model activations during in-context learning, are language-agnostic. Using machine translation as a case study across th…
RESEARCH · CL_109575 · Jun 24 · 03:57

New Japanese TTS system tackles kanji polyphony with massive data scaling

Researchers have developed Sarashina2.2-TTS, a novel text-to-speech system specifically designed for Japanese, addressing the challenge of kanji polyphony. The system utilizes a massive dataset of approximately 361,000 …
RESEARCH · CL_109576 · Jun 24 · 03:54

New AI models tackle low-resource Tangkhul-English translation

Researchers have developed two neural machine translation systems for the low-resource Tangkhul-English language pair. The primary system, utilizing ByT5-large fine-tuned on over 38,000 parallel sentences, achieved a BL…
TOOL · CL_107534 · Jun 24 · 03:17

AssemblyAI launches Medical Mode with native code-switching transcription

AssemblyAI has introduced a new Medical Mode for its transcription models, focusing on accurate handling of code-switching within clinical conversations. Unlike systems that require language toggles, AssemblyAI's Univer…
RESEARCH · CL_107116 · Jun 23 · 20:50

Data scale, not latency, dictates cross-lingual speech recognition transfer

A new study indicates that the scale of training data, rather than latency, is the primary factor influencing the effectiveness of cross-lingual transfer in streaming speech recognition models. Researchers found that wh…
RESEARCH · CL_107785 · Jun 23 · 17:10

New Marathi POS Tagging Dataset and BERT Models Released

Researchers have introduced L3Cube-MahaPOS, a new dataset for Marathi Part-of-Speech (POS) tagging, addressing the scarcity of annotated resources for the language. The dataset contains over 32,000 manually annotated se…
RESEARCH · CL_107768 · Jun 23 · 11:47

African languages face significant tokenization penalty in frontier LLMs

A new research paper reveals a significant "African Language Tax" in frontier large language models, where tokenizers assign substantially more subword tokens to African languages compared to English. This results in hi…
TOOL · CL_105129 · Jun 22 · 14:08

New benchmark measures LLM over-alignment in criminal law

A new benchmark, TF-RefusalBench, has been developed to measure and mitigate over-alignment in large language models (LLMs) used within multilingual criminal law contexts. The benchmark, comprising 5,200 prompts across …
RESEARCH · CL_105005 · Jun 22 · 09:10

LLMs rely on third-party sites like Wikipedia for brand info, study finds · 4 sources tracked

A new study reveals that large language models (LLMs) primarily rely on third-party sources, such as Wikipedia and YouTube, to generate information about brands. Research indicates that Wikipedia is the most cited domai…
TOOL · CL_105165 · Jun 22 · 08:17

Study compares DeepL, eTranslation, Systran MT systems for specialized French translation

A new study evaluates the performance of three machine translation (MT) systems—DeepL, eTranslation, and Systran—in translating specialized English content into French. The research also compared the post-editing effort…
MEME · CL_102473 · Jun 21 · 01:49

Reddit discusses training LLMs to think in optimized AI languages

A discussion on Reddit explores the concept of training Large Language Models (LLMs) to think in an optimized, non-human language instead of English. The user posits that such an approach could potentially allow AIs to …
TOOL · CL_104724 · Jun 20 · 23:23

LLMs struggle with Hausa and Fongbe translation, metrics unreliable

A new study evaluated the machine translation capabilities of four large language models (LLMs) for Hausa and Fongbe, two West African languages. The research found that while Hausa achieved acceptable translation quali…