Monitoring

What happens to your application when you’re not watching? How many people are signing up? How many are posting? Did the latest code release slow anything down? What was going on when the server crashed?!?!¬†These are all things that server monitoring can help you cope with.

When you want to make sure that all services are working, Nagios can monitor your services and alert you when you need it. You can even add thresholds for custom metrics (there should always be someone posting in the past 15 minutes) if you can expose such metrics.

But something I like more than this is using Cacti for watching what my servers are doing. You can expose metrics via an internal feed that Cacti can pick up from. The metrics are only limited by your imagination… and the time you have to setup new graphs! Some of the things I like to monitor are:

  • New user signups
  • Major user interactions
  • In shopping carts, user checkouts
  • Any third party errors that occur
  • Time taken accessing the database or external services

This way, you can see when something is going wrong. For instance, after a major release, you’ll be able to ensure that the new user signups aren’t dropping. The graphs will show you historically the ebbs and flows of user signups. Setting up graphs for the things that are important to you will ensure that your application is working as it should.

And when things do go wrong, you can see exactly why. If the server was unresponsive, you can check the RAM, CPU, disk, MySQL connections and determine what the cause of a crash was.

Otherwise, without monitoring, you can just hope for the best and pray that it won’t happen again.

This entry was posted in Cacti, Lessons learned, Server Admin. Bookmark the permalink.

One Response to Monitoring

  1. Uhrin says:

    Amazing how much GIGO is really out there! I aolttly agree, anyone can collect metrics not everyone can turn that into intelligence.When approaching correlation, I tend to look at the world as a set of problem sets. How do I navigate these problem sets and present timely, valuable intelligence to folks in a repeatable manner? How can I use that information to positively effect decisions, services, and culture.From the realm of understanding the problem set Its different than understanding how to get metrics or count them. And yes, it does require domain knowledge and expertise. And in delving into new domains and correlation Its like you have to teach the software about the problem domain. And you never really learn until you TEACH!