Failover
Failover is what happens when a primary system fails and traffic automatically switches to a backup. For game servers, failover is the difference between a provider outage causing extended downtime and players noticing nothing at all.
Why failover matters at launch
Launch day is the worst time to discover your infrastructure has a single point of failure. Player counts spike unpredictably, and the load itself can expose weaknesses in hardware, networking, or third-party providers. A studio that depends on a single cloud provider has no fallback if that provider has a regional outage — which all of them do, periodically.
Manual vs. automatic failover
Manual failover requires an engineer to detect the problem, decide on a course of action, and execute it — typically taking 15 to 60 minutes even with a practiced runbook. Automatic failover is handled by the infrastructure layer itself: the system detects the failure and reroutes within seconds, before most players have noticed a problem.
For live multiplayer games, automatic failover is the only viable approach. Human response time is too slow to prevent a significant player impact.
How multi-provider infrastructure enables failover
The most robust approach to failover is running infrastructure across multiple providers simultaneously. If one provider has issues in a region, the orchestrator can immediately place new sessions on an alternative provider in the same or a nearby region. Existing sessions continue unaffected on their original host; only new session requests are rerouted.
Gameye runs multi-provider infrastructure in every region — bare metal from Gcore and OVHCloud, with cloud and edge capacity available for burst. If one provider reports problems, session placement shifts to backup providers automatically. This is the architecture behind Gameye’s 99.99% uptime SLA.
See also: Downtime · Auto-scaling · Bare metal servers