AWS Parallel Computing Service supports in-place Slurm major version upgrades
AWS Parallel Computing Service (PCS) now supports managed in-place Slurm version upgrades for existing clusters. You can move your clusters up to three Slurm major versions ahead with no disruption to running jobs. To upgrade, update your Cluster configuration with your target Slurm version using the AWS Management Console, AWS CLI, or UpdateCluster API. PCS handles the upgrade of all managed Slurm components — the controller, accounting database, and REST API. Running jobs continue uninterrupted during the upgrade, queued jobs resume once the operation completes, and any accounting data is pr
The continuous evolution and scaling of high-performance computing (HPC) and AI workloads demand more agile and less disruptive infrastructure management, driving cloud providers to enhance their orchestration capabilities.
This development streamlines the operational overhead for critical HPC and AI compute clusters, reducing downtime and accelerating the pace of innovation and research that relies on these large-scale systems.
Cloud-based Slurm clusters can now undergo major version upgrades with zero downtime for running jobs, improving reliability, developer productivity, and cost-efficiency for compute-intensive tasks.
- · AWS
- · Cloud-based HPC users (research institutions, enterprises)
- · AI/ML developers
- · DevOps engineers
- · On-premise HPC solutions (relatively)
- · HPC administrators needing manual upgrades
Increased adoption of AWS for complex HPC and AI workloads due to improved operational ease and reliability.
Faster iteration and deployment of large-scale models and simulations, accelerating scientific discovery and product development across various industries.
Potential for new business models and services built on top of highly resilient, auto-upgrading cloud HPC infrastructure, further expanding the 'compute supply chain' effect.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at AWS What's New