OpenAI researchers explored the potential risks associated with open-weight large language models by introducing a method called malicious fine-tuning (MFT). This technique involved fine-tuning an open-weight model, gpt-oss, to excel in biology and cybersecurity domains, aiming to uncover worst-case capabilities. The study found that while MFT gpt-oss showed some marginal improvements in biological capabilities compared to other open-weight models, it did not significantly advance the frontier and underperformed against closed-weight models on specific risk evaluations. These findings informed OpenAI's decision to release the model and aim to guide future risk assessments for similar open releases. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON The item describes a research paper published by OpenAI detailing a study on the risks of open-weight LLMs, including a novel methodology for risk estimation.