Fleet management
Fleet management is the process of provisioning, monitoring, scaling, and maintaining a pool of game servers across regions to meet player demand. It covers everything from initial capacity planning to real-time scaling decisions during live operations.
What fleet management involves
- Capacity planning — Deciding how many servers to pre-provision per region based on expected player counts.
- Scaling — Adding or removing server capacity as demand changes. This can be manual (an operator watches dashboards and adds instances) or automatic (autoscaling rules triggered by utilisation thresholds).
- Health monitoring — Tracking whether servers are responsive, handling sessions correctly, and not approaching resource limits.
- Deployment — Rolling out new server builds across the fleet without downtime.
- Failover — Redirecting traffic when servers or entire providers fail.
DIY vs managed fleet management
With DIY fleet management (e.g. running Agones on Kubernetes or managing EC2 instances directly), the studio handles all of the above. This typically requires 1-2 dedicated infrastructure engineers and an on-call rotation.
Managed fleet management platforms handle it through their orchestration layer. The studio calls an API to start sessions. The platform handles placement, scaling, health checks, and failover automatically. The trade-off is less granular control in exchange for significantly less operational burden.