Why your iPhone app (and web site) must have an error monitoring system

So what is an error monitoring system? You're probably familiar with crash reporting, which sends you all the gory detail when your app crashes so you can fix the issue quickly. Well, error monitoring is very similar. Instead of sending you crash report, it sends you a report any time an error happen in your app.

What is it important, well let me tell you about one of my clients. The conversation to hire me was pretty quick:

CEO: "I have an iPhone app which uses Parse. The Facebook login is not working since we migrated to Parse Server."

me: "Well, I have worked on an application using parse recently, so I can probably help. Do you know what kind of error you are getting?"

CEO: "Our users are all over us in Facebook, the App Store and Twitter because they can't login. They keep getting back to the signing screen."

me: "Okayyyy, anything else that you know?"

CEO: "Not really, our CTO left a few month back, I have a new CTO now, but I want him focused on the version coming out. How long do you think it might take you?"

me: "No clue. You don't know what kind of error you are getting so first I'm going to have to get familiar with your code, then figure out what is going on, then figure how to fix it... It's not going to be cheap."

CEO: "When can you start?"

me: "Tomorrow"

CEO: "You're hired..."

Okay, those are not quotes, there was a very brief talk about money too, but you get the idea.

So, the next day I dove in. I tried again to get more details about the issue from the CEO, but I didn't get anything else. All I got were the credentials to access everything about the app (github account, AWS account, crashlytics account, etc) and off I went. I started by looking at the code of the application to see if I noticed anything obvious. You can hear my clock ticking in the background. Then, as I did not have any luck in the iPhone app, I jumped on the server to look at the logs, see if anything jumps at me. Still no luck and my clock is still ticking. Then, I looked at the server code, more ticking and still no luck.

At this point, I spent about 8~10 hours on the project and still no closer to a solution. So I offered to do what I do on day one of all my projects: install Rollbar (or a similar in app logging system) in the app so we could see what error the users are encountering, instrument as many places around the login flow as possible and send the app to his TestFlight users. Then wait and see what error we get back.

Sure enough within an hour, we got a few errors back with the exact error number that the Parse Server was throwing back our way. AKA, we finally knew what our users were experiencing. Now keep in mind this did not mean the problem was fixed, this just meant that we knew what the problem was.

So the CEO, had to pay me for about 10/15 hours just to know what problem we were facing. Because his original team did not get in the habit of logging those errors which you don't want to handle in MVPs so you can focus on quick iterations.

My strategy for MVPs is the following:

  • create the code everyone will use
  • only handle error cases you know will happen and instrument all error cases (no matter how small)
  • later, if I see that a lot of users are running into an issue, I go write the code to fix it

Instrumentation takes about 20 second per error case. If you do the math, the 10 hours I billed just to figure out what the issue was, are as much as instrumenting 1800 errors...

So when it comes to your business, it is a wise decision to instrument you app. Instrumentation is cheap, it's effective and you learn about issues before your users actually tells you about it. That makes for a much better support email response: "We know about the issue, we are sorry this inconvenienced you, we have a fix coming down tomorrow." Hell, you could even email all your users who had the issue and apologize if the issue was big enough.