ImmigrationQA: A Source-Grounded Dataset and Small-Model Adaptation for U.S. Immigration Law
Researchers have developed ImmigrationQA, a new dataset containing over 17,000 question-answer pairs focused on U.S. immigration law, sourced from official documents and community forums. They fine-tuned a Llama 3.2 3B Instruct model using parameter-efficient LoRA on this dataset, achieving a 27% improvement in mean score compared to the base model. While the fine-tuned model shows gains in procedural areas, it still struggles with complex legal reasoning, and the project's artifacts are publicly released. AI
IMPACT Provides a specialized dataset and fine-tuned model to improve AI's understanding of complex legal domains.