
arXiv:2503.08600v3 Announce Type: replace Abstract: We introduce NSF-SciFy, a comprehensive dataset of scientific claims and investigation proposals extracted from National Science Foundation award abstracts. While previous scientific claim verification datasets have been limited in size and scope, NSF-SciFy represents a significant advance with 2.8 million claims from 400,000 abstracts spanning all science and mathematics disciplines. We present two focused subsets: NSF-SciFy-MatSci with 114,000 claims from materials science awards, and NSF-SciFy-20K with 135,000 claims across five NSF direct
The proliferation of AI models necessitates larger and more diverse datasets for training and verification, pushing researchers to create comprehensive resources like NSF-SciFy.
This dataset significantly advances scientific claim verification, enabling more robust AI applications in research and development, and providing a scalable resource for knowledge extraction.
The ability to automatically extract and verify scientific claims from a vast database of research proposals provides a new foundation for scientific knowledge management and discovery.
- · AI researchers
- · Science funding bodies
- · Data scientists
- · Scientific research institutions
- · Manual data annotation services
- · Less data-driven research methodologies
Researchers gain access to an unparalleled dataset for developing and testing AI models for scientific claim verification and knowledge discovery.
The improved ability to verify scientific claims could accelerate research progress in various fields by identifying promising avenues and debunking unreliable assertions.
Automated scientific claim verification could eventually lead to AI systems that can propose and evaluate hypotheses, fundamentally changing the scientific process.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL