Mandarin word duration and pitch predicted by contextualized embeddings

By PulseAugur Editorial · [1 sources] · 2026-07-03 04:00

Researchers have developed a method using contextualized embeddings (CEs) to predict spoken word duration and pitch in Mandarin monosyllabic words. The study demonstrated that CEs can accurately predict word duration, even at the individual token level, outperforming chance and permutation baselines. These predicted durations were precise enough to reconstruct pitch contours, which also approximated empirical contours and surpassed a permutation baseline. AI

IMPACT This research could improve speech synthesis and recognition systems by enabling more accurate prediction of prosodic features.

RANK_REASON Academic paper detailing a new method for predicting linguistic features using embeddings. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

Mandarin

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Mandarin word duration and pitch predicted by contextualized embeddings

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Xiaoyun Jin, Mirjam Ernestus, R. Harald Baayen · 2026-07-03 04:00

Using embeddings to predict spoken word duration and pitch in Mandarin monosyllabic words

arXiv:2607.02002v1 Announce Type: new Abstract: Time-normalized f0 contours of Mandarin words in conversational speech have been shown to be predictable in part from their contextualized embeddings (CEs). The present study investigates whether CEs also predict spoken word duratio…

COVERAGE [1]

Using embeddings to predict spoken word duration and pitch in Mandarin monosyllabic words

RELATED ENTITIES

RELATED TOPICS