A machine-learning-assisted progressive digit-randomness screening framework for detecting non-random patterns in raw numerical research data

arXiv:2606.07128v1 Announce Type: new Abstract: Raw numerical datasets remain less systematically examined in integrity screening than images, plagiarism, or summary-statistic inconsistencies. We developed the Fabrication-risk Digit Randomness Screening model (FDRS), a statistical and machine-learning framework for detecting non-random digit-pattern irregularities in numerical research data. FDRS integrates single- and joint-decimal-digit tests, Cramer's V, entropy metrics, Kullback-Leibler divergence, digit-preference indices, progressive subsampling, and semi-supervised risk scoring. It was
The increasing volume and complexity of scientific data, particularly within AI research, necessitates more robust integrity checks beyond traditional methods.
This development enhances the trustworthiness of numerical research data, critical for scientific progress and reproducible AI, by combating fabrication and errors.
The systematic detection of non-random patterns in raw numerical data can lead to more reliable scientific publications and AI model training datasets.
- · Scientific research community
- · AI ethics and integrity platforms
- · Data scientists and statisticians
- · Researchers engaging in data fabrication
- · Journals with weak data integrity protocols
Increased scrutiny and integrity in published numerical research data, making fabricated results harder to pass.
Higher quality datasets lead to more robust and reliable AI models, reducing 'garbage in, garbage out' problems.
Improved public trust in scientific research and AI-driven insights, potentially accelerating responsible innovation.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG