Do LLMs Reliably Identify Correct Information Units in Aphasic Discourse?
A new research paper explores the use of instruction-tuned large language models (LLMs) for classifying Correct Information Units (CIUs) in aphasic discourse. The study found that while zero-shot prompting was insufficient, few-shot prompting significantly improved performance for models like Llama 3.1:8b, qwen2.5:7b, and mistral:7b, achieving competitive results with human annotators. However, the LLMs showed high recall but lower precision, indicating a tendency to over-classify tokens as CIUs, and performance varied with aphasia severity. AI
IMPACT LLM prompting shows potential for automated CIU identification in aphasia assessment, offering a human-in-the-loop solution.