Scalability
Scalability is a multiplayer game infrastructure’s ability to handle increases in player demand — more concurrent sessions, more regions, more traffic — without degrading performance or requiring manual intervention. For dedicated multiplayer games, scalability has two distinct dimensions: how fast the system can respond to demand spikes, and how far it can scale before hitting hard capacity limits.
Vertical vs. horizontal scaling
Vertical scaling means making individual servers more powerful — more CPU, more RAM. It has a ceiling and requires downtime to apply. For game servers it is rarely the right approach.
Horizontal scaling means adding more server instances to handle more concurrent sessions. This is how multiplayer game infrastructure scales in practice: when demand rises, the orchestrator starts more containers; when it drops, it reclaims them. The limiting factors are available capacity, the speed at which new instances can start, and cost per session.
Why scaling speed is a player-facing problem
For most web applications, scaling over 60–120 seconds is acceptable — users experience a brief slowdown at worst. For multiplayer games, that window is too long. A matchmaker that requests a server and waits two minutes for provisioning introduces a two-minute queue delay for every player in that match. During a launch — when thousands of players are trying to connect simultaneously — slow scaling compounds into a launch failure.
Container-based orchestration addresses this directly. Because server images are pre-pulled and containers start with minimal overhead, new game server sessions can be ready in under a second. Gameye averages 0.5 seconds from session request to running container. VM-based scaling, which requires booting a full virtual machine, typically takes one to several minutes regardless of the underlying hardware.
Scaling limits: bare metal vs. cloud
Bare metal infrastructure offers the best price-per-session for predictable baseline load, but physical capacity is finite. Cloud infrastructure is effectively elastic but more expensive per session and introduces variable network performance.
The most scalable architectures use both: bare metal for baseline load, cloud for burst capacity during spikes. When a launch or free weekend pushes demand beyond bare metal limits, the orchestrator overflows into cloud instances automatically — giving effectively unlimited scale without permanently provisioning expensive cloud capacity.
The hidden cost of scalability
Platforms that claim unlimited scalability often charge for it through data transfer fees. AWS GameLift, for example, charges ~$0.09/GB for egress — a cost that scales directly with the number of active sessions. A game with 100,000 concurrent players generating 10GB of traffic per player per hour would face significant egress bills on top of compute costs.
Capacity-based pricing with no egress fees means that scaling up adds only compute cost, not compounding bandwidth charges. For studios planning for viral growth or free weekend spikes, the pricing model is as important as the technical scaling capability.
See also: Auto scaling · Game server orchestration · How Gameye handles scaling across bare metal and cloud · Chivalry 2 — scaling to 250,000 concurrent players at launch · Gameye vs. AWS GameLift