Google services like Gmail, Calendar, Google+, and pretty much anything and everything that required a login was down for a while today, and it turns out the reason for it all was a bum configuration file that got accidentally, devastatingly pushed live. Ben Treynor, VP Engineering at Google:
At 10:55 a.m. PST this morning, an internal system that generates configurations—essentially, information that tells other systems how to behave—encountered a software bug and generated an incorrect configuration. The incorrect configuration was sent to live services over the next 15 minutes, caused users' requests for their data to be ignored, and those services, in turn, generated errors. Users began seeing these errors on affected services at 11:02 a.m., and at that time our internal monitoring alerted Google's Site Reliability Team. Engineers were still debugging 12 minutes later when the same system, having automatically cleared the original error, generated a new correct configuration at 11:14 a.m. and began sending it; errors subsided rapidly starting at this time. By 11:30 a.m. the correct configuration was live everywhere and almost all users' service was restored.
Google apologized and said additional steps would be taken to try and prevent such things from happening again in the future. (And I apologize for making up words in the title, even if they're funny.)
Just another reminder that every service — every service — goes down from time to time, and it's how they handle it when they do that matters. Also a reminder of how dependent most of us are on online services these days. Seriously, what's worse: A power outage when you still have internet, or an internet outage when you still have power?