ImmigrationQA: A Source-Grounded Dataset and Small-Model Adaptation for U.S. Immigration Law

arXiv:2605.30589v1 Announce Type: cross Abstract: U.S. immigration law spans thousands of pages of official policy, federal regulations, and procedural guidance that change frequently and carry high stakes for petitioners who lack legal representation. We describe the construction of ImmigrationQA, a source-grounded question-answering dataset of 17,058 pairs across 13 immigration subdomains, and the fine-tuning of a Llama 3.2 3B Instruct model on that dataset using parameter-efficient LoRA. The corpus was assembled from 11 primary and secondary sources -- including the USCIS Policy Manual, 8 C
The proliferation of open-source large language models like Llama combined with the urgent need to make complex legal information accessible is driving efforts to build specialized AI applications.
This development highlights the immediate application of AI to provide clarity and access to critical information in high-stakes domains, potentially democratizing access to legal guidance.
The explicit effort to adapt small AI models for specific, high-stakes legal domains like immigration law changes the landscape by making AI-powered legal assistance more widespread and affordable.
- · Immigrants lacking legal representation
- · Legal aid organizations
- · Open-source AI developers
- · Companies specializing in legal tech
- · Traditional legal services with high fees
- · Information gatekeepers in legal domains
Easier access to U.S. immigration law guidance for individuals.
Reduced burden on legal aid services and potentially a decrease in immigration-related legal errors due to better information access.
The model could be adapted to other complex legal systems globally, accelerating the use of AI for legal literacy and access to justice worldwide.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI