Structural Enforcement of Statistical Rigor in AI-Driven Discovery: A Functional Architecture

arXiv:2511.06701v3 Announce Type: replace-cross Abstract: AI-Scientist systems risk manufacturing spurious discoveries through uncontrolled multiple testing. We present a functional architecture that enforces statistical rigor at two levels: a Haskell embedded domain-specific language (the Research monad) that makes it impossible to test a hypothesis without updating the error budget, and a declarative scaffold, backed by an OS-level sandbox, that makes validation data physically absent from the environment in which LLM-generated code runs. We ground the design in a machine-checked Lean~4 form
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI