New models boost AI understanding of Malaysian English

By PulseAugur Editorial · [1 sources] · 2026-06-02 04:00

Researchers have developed new pre-trained language models, MENmBERT and MENBERT, specifically designed to improve Named Entity Recognition (NER) and Relation Extraction (RE) for Malaysian English. This creole language, a blend of English with Malay, Chinese, and Tamil elements, presents unique challenges for existing models due to its distinct grammar and code-switching. The new models, fine-tuned on a manually annotated Malaysian English News Article (MEN) dataset, showed significant improvements, particularly in RE and for specific entity labels in NER, demonstrating the value of language-specific pre-training for low-resource settings. AI

IMPACT Enhances NLP capabilities for low-resource creole languages, potentially improving information access and analysis for diverse linguistic communities.

RANK_REASON The cluster contains an academic paper detailing new models and datasets for a specific language variant. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New models boost AI understanding of Malaysian English

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam · 2026-06-02 04:00

Bridging the Gap: Transfer Learning from English PLMs to Malaysian English

arXiv:2407.01374v2 Announce Type: replace Abstract: Malaysian English is a low resource creole language, where it carries the elements of Malay, Chinese, and Tamil languages, in addition to Standard English. Named Entity Recognition (NER) models underperform when capturing entiti…

COVERAGE [1]

Bridging the Gap: Transfer Learning from English PLMs to Malaysian English

RELATED TOPICS