We Asked 10 LLMs to Write Efficient Code. Only 4 Got Better.
A recent evaluation of ten large language models revealed that only GPT-5.4 consistently improved its code efficiency when explicitly prompted to do so. While most models showed minimal or even negative impact from efficiency-focused prompts, GPT-5.4 demonstrated significant gains on tasks like configuration generation and HTML creation. Gemma 4 31B emerged as a cost-effective alternative, producing naturally efficient code at a much lower cost, whereas Cohere Command A's efficiency decreased when prompted. AI
IMPACT Confirms that explicit prompting for efficiency does not universally improve LLM code generation, highlighting model-specific behaviors and potential training misalignments.