SciText2Eq: Assessing LLMs for Explainable Equation Generation for Scientific Creativity
Researchers have developed SciText2Eq, a new framework and dataset to evaluate the capability of large language models (LLMs) in generating mathematical equations from scientific texts. The study found that while LLMs show moderate performance in lexical and syntactic similarity, they struggle with semantic accuracy in equation generation. Furthermore, LLM-based evaluations for equation quality showed limited alignment with human judgments, indicating challenges in using AI to assess scientific creativity. AI
IMPACT Highlights limitations in LLM's semantic understanding for scientific tasks, suggesting a need for improved evaluation methods.