TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning

arXiv:2601.21692v2 Announce Type: replace Abstract: Fine-Tuning-as-a-Service (FTaaS) facilitates the customization of Multimodal Large Language Models (MLLMs) but introduces critical backdoor risks via poisoned data. Existing defenses either rely on supervised signals or fail to generalize across diverse trigger types and modalities. In this work, we uncover a universal backdoor fingerprint-attention allocation divergence-where poisoned samples disrupt the balanced attention distribution across three functional components: system instructions, vision inputs, and user textual queries, regardles
The proliferation of Fine-Tuning-as-a-Service (FTaaS) for Multimodal Large Language Models (MLLMs) and increasing sophistication of adversarial attacks necessitate robust, generalized backdoor detection methods.
This research provides a novel, unsupervised method for detecting backdoor risks in fine-tuned MLLMs, addressing a critical security vulnerability that could undermine trust and integrity in AI systems.
The ability to independently and thoroughly audit fine-tuned MLLMs for malicious backdoors without relying on supervised signals changes the landscape of AI security for enterprises and cloud providers.
- · AI platform providers
- · Enterprises adopting MLLMs
- · AI security researchers
- · MLOps platforms
- · Malicious actors
- · Undetected poisoned models
Increased integrity and trustworthiness of MLLMs in various applications.
Reduced risk of supply chain attacks targeting AI models through fine-tuning.
Potential for new regulatory frameworks and industry standards around AI model provenance and security.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI