Terastal: Layer-Variant-based Scheduling for Real-Time Multi-DNN Workloads on Heterogeneous Accelerators

arXiv:2606.06818v1 Announce Type: cross Abstract: Heterogeneous DNN accelerators improve soft real-time multi-DNN execution by mapping each layer to its preferred accelerator to reduce latency. However, under skewed workloads, large layer-latency differences across accelerators limit scheduling flexibility and increase deadline misses. To address this challenge, we introduce layer variants, customized layer implementations that reduce latency gaps on non-preferred accelerators. We then present Terastal, a soft real-time framework for layer-variant design and scheduling on heterogeneous DNN acc
The increasing complexity and heterogeneity of AI accelerators, combined with the demand for real-time multi-DNN execution, necessitates advanced scheduling solutions to optimize performance and efficiency.
This research directly addresses efficiency bottlenecks in AI processing by enabling more effective utilization of diverse hardware, which is crucial for scaling AI applications and reducing operational costs.
The introduction of 'layer variants' and the 'Terastal' framework changes how multi-DNN workloads are managed on heterogeneous hardware, potentially leading to significant improvements in latency and resource utilization.
- · AI hardware manufacturers
- · Cloud AI providers
- · Real-time AI application developers
- · Semiconductor industry
- · Inefficient legacy AI scheduling systems
Improved performance and reduced latency for complex AI workloads on existing and next-generation heterogeneous accelerators.
Accelerated development and deployment of sophisticated AI services that rely on real-time multi-DNN execution, such as advanced robotics or autonomous systems.
Enhanced competition among AI infrastructure providers based on efficiency and performance metrics, potentially lowering the cost of AI compute.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG