Why bad bugs hit good people

Software is buggy. Humans write and test software and humans are imperfect; as a result, so is software. This is the reality of software and should come as a surprise to nobody. What can be surprising are the kind of bugs we actually see make their way out into the wild. We've seen two very prominent examples this week. The first was the release of iOS 8.0.1 on Wednesday which broke cellular service and Touch ID for iPhone 6 and iPhone 6 Plus users. The very same day we saw a huge bug in bash publicly disclosed; a vulnerability leaving millions and millions of personal computers, servers, embedded systems, and who knows how many other types of Internet-connected devices open to attack. And for most people, it's baffling how bugs like this could ever find their way into the world. Aren't developers supposed to be smart? The bash bug may be obscure enough that many end-users don't understand it, but what about iOS 8.0.1? How could such a big piece of software ship with such a glaring bug that broke such critical pieces of functionality?

I lead the quality assurance department at a mobile development company. It's QA's job to help ensure that we're shipping the best apps we can by finding bugs. Whether it be an incorrectly scaled image, functionality that doesn't meet requirements, a weird edge-case that causes undesired behavior, or an ungraceful failure under unexpected circumstances; every aspect of the software is fair game for QA to scrutinize. A big part of the reason I work where I do is that I'm surrounded by a lot of really smart people who are very good at what they do—I learn a lot from them. These are very talented engineers who frequently solve very difficult problems and create well-respected, widely-used, polished apps. But I'll let you in on a little secret... they write bugs. I'll let you in on another secret... QA doesn't catch them all. In fact, this post will probably go live with a typo in it that I didn't catch.

Apple, as well as many third-party app developers, has done a great job concealing the complexity of software. The simple designs on the surface can make it easy to forget that underneath are millions of lines of code, written by humans. Some of the code old, some of it new. Some of it easy to read, some of it obscure and confusing enough to make even the smartest engineer bang their head against a wall trying to figure it out. A change in one section of code can have an impact in a completely different area that you would never guess would be affected. On top of that, you have a nearly infinite number of combinations of hardware, software, operating environments, and variables to consider. As users we're accustomed to using simple taps and swipes to interact with fluid UIs and pretty pictures. Sometimes the more seamless the experience and more delightful the design, the more complicated and confusing the code that's underneath.

Of course, none of this explains how iOS 8.0.1 found its way into the world. I don't know how it happened. Maybe it was a small, last-minute change that got pushed without sufficient regression testing. Maybe QA found the bug, but its severity wasn't clearly communicated. Maybe there was an entire team of overworked and understaffed QA engineers who, by the end of it, didn't have the clarity of mind to think to check those pieces of functionality. There are limitless possibilities and we may never know what the cause was.

I've experienced the gut-wrenching unpleasantness of being part of a team that was responsible for shipping a major bug. For development and QA teams, I can't think of anything worse than pouring your heart and soul into a project you're passionate about, working tirelessly night after night to meet impossible deadlines, feeling relieved and euphoric to have finally shipped, only to have the rug pulled out from under you with a horrible bug that somehow got missed. It's awful. It's heartbreaking. And even once you've pulled a few more days of insane hours remediating the bug, you're still left unable to stop beating yourself up. You can't stop thinking "how could I have missed that?" While I don't know how the issues in 8.0.1 made it out the door, I do know that it wasn't the result of a lack of intelligence, skill, or care.

All of this is not to say that there shouldn't be responsibility for the bug, and I'm certain there will be. Apple has to be accountable to its customers. There are certainly worse things that could happen, like customer data loss, but breaking cellular service is definitely near the top of the list of worst bugs you could ship for a phone. Their response to this was to pull the update as soon as they had learned about and confirmed the problem, release a guide for users affected to revert back to iOS 8.0, and release a fixed update the following day. Short of not shipping the bug in the first place, that's about as good as you can hope to do for handling an issue like that. There also should be, and surely will be corrective action taken within Apple to address whatever gap in process may have let this slip through. Whenever a serious bug makes its way into the wild, it's essential to evaluate how it happened, and come up with a plan to make sure it doesn't happen again. Undoubtedly this action has either already, or is currently taking place within Apple.

Make no mistake, this was a serious bug. It shouldn't have shipped. While many bugs are mere annoyances, and 8.0.1 was nothing but a mere annoyance for most users, it had the potential to be catastrophic. People rely on their phones for emergencies. On a worse day, 8.0.1 could have contributed to somebody being unable to get help in a dangerous situation. Now, that's an extreme example, but it's a realistic one. But Apple realizes this, their developers realize this, and their QA team realizes it. Nobody is working at Apple because they think it's a good place to work on products that won't impact people's lives. Apple knows they will and do, better than almost anybody. As bad as we think 8.0.1 was, I have to imagine it pales in comparison to how the people inside Apple feel about it.

Mistakes happen. Bugs happen. It can be easy to place Apple on a pedestal and forget that they're a company made up of human beings like us (albeit probably with a higher average IQ). The people who work for Apple are really good at what they do, but at the end of the day they're still people. This isn't the first time we've seen Apple make a mistake and it certainly won't be the last. We all make mistakes. Most of us are just fortunate enough to not have hundreds of millions people who could potentially be affected, and the whole world watching when we do it. In the end, what's more important than a mistake is how the people who made it choose to respond to and learn from it.