What happens to your application when you’re not watching? How many people are signing up? How many are posting? Did the latest code release slow anything down? What was going on when the server crashed?!?! These are all things that server monitoring can help you cope with.
When you want to make sure that all services are working, Nagios can monitor your services and alert you when you need it. You can even add thresholds for custom metrics (there should always be someone posting in the past 15 minutes) if you can expose such metrics.
But something I like more than this is using Cacti for watching what my servers are doing. You can expose metrics via an internal feed that Cacti can pick up from. The metrics are only limited by your imagination… and the time you have to setup new graphs! Some of the things I like to monitor are:
- New user signups
- Major user interactions
- In shopping carts, user checkouts
- Any third party errors that occur
- Time taken accessing the database or external services
This way, you can see when something is going wrong. For instance, after a major release, you’ll be able to ensure that the new user signups aren’t dropping. The graphs will show you historically the ebbs and flows of user signups. Setting up graphs for the things that are important to you will ensure that your application is working as it should.
And when things do go wrong, you can see exactly why. If the server was unresponsive, you can check the RAM, CPU, disk, MySQL connections and determine what the cause of a crash was.
Otherwise, without monitoring, you can just hope for the best and pray that it won’t happen again.