NTNU system integrates W2V and Phi-4 for spoken language assessment

By PulseAugur Editorial · [1 sources] · 2026-06-30 04:00

Researchers from NTNU have developed a novel system for spoken language assessment (SLA) that integrates the wav2vec 2.0 (W2V) model with the Phi-4 multimodal large language model (MLLM). This approach aims to overcome the limitations of existing methods, such as BERT-based systems that miss prosodic cues and W2V-based systems that lack semantic interpretability. The combined system achieved a root mean square error (RMSE) of 0.375 on the Speak & Improve Challenge 2025 test set, securing second place. AI

IMPACT This research demonstrates a novel approach to integrating acoustic and semantic models for language assessment, potentially improving automated evaluation systems.

RANK_REASON The cluster contains an academic paper detailing a new system for spoken language assessment. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

NTNU system integrates W2V and Phi-4 for spoken language assessment

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Hong-Yun Lin, Tien-Hong Lo, Yu-Hsuan Fang, Jhen-Ke Lin, Chung-Chun Wang, Hao-Chien Lu, Berlin Chen · 2026-06-30 04:00

The NTNU System at the S&I Challenge 2025 SLA Open Track

arXiv:2506.05121v3 Announce Type: replace Abstract: A recent line of research on spoken language assessment (SLA) employs neural models such as BERT and wav2vec 2.0 (W2V) to evaluate speaking proficiency across linguistic and acoustic modalities. Although both models effectively …

COVERAGE [1]

The NTNU System at the S&I Challenge 2025 SLA Open Track

RELATED ENTITIES

RELATED TOPICS