Training Azerbaijani language models on Amazon SageMaker AI
Azercell Telecom, in collaboration with the AWS Generative AI Innovation Center, has developed a framework for training Azerbaijani large language models on Amazon SageMaker AI. This initiative focused on overcoming challenges related to a morphologically rich language with limited data, achieving a 23% increase in training throughput and a 58% reduction in GPU memory usage through kernel optimizations. The project also introduced a custom tokenizer that improved token efficiency by doubling the amount of Azerbaijani text fitting within the model's context window. AI
IMPACT Establishes a replicable framework for training LLMs on morphologically complex, low-resource languages, potentially accelerating AI development in underserved linguistic communities.