StarCoder2 and The Stack v2
Hugging Face has released StarCoder2, a new family of large language models for code generation, trained on a massive dataset called The Stack v2. This dataset comprises over 600 programming languages and includes a significant amount of permissively licensed code. The StarCoder2 models are available in three sizes, with the largest boasting 15 billion parameters, and are designed to advance research and development in AI-powered coding tools. AI