Presentation: Realtime and Batch Processing of GPU Workloads

Updated 26 May 2026

Joseph Stein discusses engineering an enterprise AI-as-a-Service platform within a private cloud data center. He explains how to maximize underutilized GPU pools via multi-namespace scheduling, leverage Valkey and Lua for atomic priority queuing and backpressure management, mitigate OWASP Top 10 LLM risks via central proxy gateways, and scale batch pipelines using a custom S3-to-Kafka proxy. By Joseph Stein

Source: InfoQ — read the full report at the original publisher.

This is a curated wire item. The Continuum Brief does not republish full third-party articles; this entry links to the original source.

Source

InfoQ · View original

#Case Study#GPU#Scalability#Cloud#QCon San Francisco 2025#Transcripts#DevOps#presentation

Supported by VREXO™ Intelligence Systems.

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.