daVinci-kernel: Co-Evolving Skill Selection, Summarization, and Utilization via RL for GPU Kernel Optimization

arXiv:2606.16497v1 Announce Type: cross Abstract: GPU kernel optimization represents a paradigm where functional correctness is assumed and execution efficiency is the objective. We present daVinci-kernel, a reinforcement learning framework that couples skill discovery with skill exploitation through a dynamically evolving skill library. daVinci-kernel jointly trains three agents sharing one LLM backbone: a Skill Selection Agent that retrieves relevant techniques via BM25 and LLM reranking, a Policy Agent that generates multi-turn CUDA/Triton kernels conditioned on selected skills, and a Skill
The increasing complexity and energy demands of AI models are driving intense research into more efficient hardware utilization, making GPU optimization a critical bottleneck solvable by advanced AI techniques.
This research introduces an AI agentic approach to automatically optimize GPU kernels, moving beyond manual or heuristic-based methods, which could significantly boost the efficiency and performance of AI workloads.
Current GPU optimization often requires specialized human knowledge; this framework demonstrates an AI's ability to autonomously generate and optimize code, potentially democratizing access to high-performance computing.
- · AI developers
- · GPU manufacturers (indirectly through demand)
- · Cloud computing providers
- · Academic researchers
- · Manual GPU optimization consultants
- · Less efficient AI hardware architectures
Increased performance and reduced energy consumption for AI training and inference on GPUs.
Accelerated development of even larger and more complex AI models due to available computational efficiencies.
A potential shift in how computational hardware is designed and interacted with, leaning more towards AI-driven optimization loops.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL