MemoVAD: Resource-Efficient Video Anomaly Detection via Dynamic Semantic Memory in Edge Computing Scenarios

arXiv:2606.07669v1 Announce Type: cross Abstract: Deploying Video Anomaly Detection (VAD) in real-world surveillance faces a fundamental tension between the demand for high-level semantics to ensure effectiveness and the limited computational resources of edge devices. Vision-Language Models (VLMs) provide rich open-vocabulary semantics, but their latency and computational cost preclude on-device deployment. To address the challenge, we propose MemoVAD, an edge-cloud collaborative framework that selectively incorporates VLM semantics into streaming VAD. MemoVAD runs most inference on the edge
The proliferation of surveillance footage and the maturation of AI models for video analysis are converging, demanding efficient deployment solutions at the edge.
This development addresses a critical tension in real-world AI deployment: leveraging powerful VLM semantics while respecting the computational constraints of edge devices in surveillance applications.
The ability to deploy sophisticated video anomaly detection more widely and efficiently at the edge by intelligently offloading VLM processing to the cloud when needed.
- · Surveillance technology providers
- · Smart city infrastructure developers
- · Cloud computing services
- · Edge AI hardware manufacturers
- · Companies reliant solely on brute-force, centralized VAD
- · Less efficient edge inference solutions
Wider adoption and improved effectiveness of video anomaly detection systems in various sectors.
Increased demand for robust edge-cloud hybrid architectures and specialized hardware for distributed AI processing.
Potentially enables new applications for real-time safety, security, and operational efficiency where on-device processing was previously a bottleneck.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI