
arXiv:2605.24079v1 Announce Type: cross Abstract: Data contamination is a known threat to the reliability of model evaluation. However, it remains underexplored in code large language models (LLMs), where contamination often goes beyond exact duplication. We present TRACER, a semantic-aware framework for fine-grained code contamination detection. TRACER models contamination using three levels of semantic overlap - Functionally Identical, Nearly Identical, and Shared Logic - and detects them through a coarse-to-fine pipeline. We also introduce the first benchmark for fine-grained code contamina
The proliferation of Code LLMs highlights an urgent need for robust evaluation methods, with contamination detection becoming critical for real-world reliability and trust.
Reliable evaluation of Code LLMs is paramount as they become integral to software development, directly impacting security, performance, and trust in AI-generated code.
The introduction of a fine-grained, semantic-aware contamination detection framework fundamentally changes how Code LLMs will be evaluated and improved, moving beyond simple duplication checks.
- · Code LLM developers
- · AI safety researchers
- · DevOps tooling providers
- · Software engineering firms
- · Undisclosed data providers
- · Low-quality LLM providers
TRACER will enable more accurate benchmarking and development of Code LLMs by identifying and mitigating hidden contamination.
Improved confidence in Code LLMs could accelerate their adoption in critical software infrastructure, leading to broader automation of coding tasks.
The methodology could inspire similar fine-grained contamination detection frameworks for other domain-specific LLMs, enhancing overall AI reliability and potentially accelerating adoption across various sectors.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL