Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study
A new study published on arXiv benchmarks seven foundation models on Ukrainian legal text, revealing significant variations in tokenizer fertility and zero-shot performance. The research found that models like Qwen 3 are less efficient with tokens compared to Llama-family models, and that NVIDIA's Nemotron Super 3 outperforms Mistral Large despite having fewer parameters, at a lower cost. The study also highlights that few-shot prompting can degrade performance in Ukrainian, and that models struggle with legal language from the full-scale invasion era compared to pre-war texts. AI
IMPACT Highlights the need for domain-specific evaluation and tokenizer efficiency for cost-effective LLM deployment in specialized legal contexts.