Replica Sets with MongoDB on OpenVZ

I almost dumped MongoDB.

I have a new instance running in Chicago that I’ve been trying to setup as a mirror of what I have running in NYC. My progress (with everything) was drastically slowed when MongoDB would crash when trying to setup a replica set between the two. The problem is that both of my instances are 2G VMs running on OpenVZ without any virtual memory. Mongo would try to allocate memory, and then horribly crash when setting up the replica set.

The problem was that Mongo tries to reserve 5% of available disk space as the log space of the replica set. This is more memory than my servers had, and therefore a big crash with Mongo trying to mmap everything in. The solution that I found, that I could not find documented anywhere was…. setting a static oplogSize in the config.

So, hopefully this will get out onto google for other people to find. When you are using MongoDB in a replica set in a low memory environment, set oplogSize low.

I’ve currently set oplogSize to 256M, which has made my available memory decrease dramatically. I’ll have to tune it, but now I have a mirror of my data replicating with automatic failover!!!! Super sweet!

UPDATE:

Here’s a graph from Cacti that shows the spike in memory from the main server.

Posted in MongoDB | Comments Off

Hackathon, Day 5

All done with the Content Aggregator! I took a bit of time to learn best practices with threads and even have my own REST server embedded in my Groovy app.

I’ve been trying to keep the memory foot print low as I only have 2G on my VM, so I had to do a bit more work. But I’m finally done with the Groovy bits (and oh how groovy it is integrating Groovy and Mongo).

Tonight it’s on to the web front end and the REST API that will service both the web and iPhone.

I’m also extending the Hackathon a week, as my personal life has disrupted my hacking more than I thought! Hopefully that will mean circling back for some cleanups, but we will see!

Posted in Groovy, Hackathon, MongoDB | Comments Off

Hackathon, Day 1

Done with the first day of the hackathon. The results are a bit disappointing, having only completed RSS parsers and a few models for feeds.

Tonight, I’ll get onto the main aggregator engine, scheduler and start inserting the feeds into MongoDB.

Posted in Groovy, Hackathon | 2 Comments

Hackathon starting… NOW!

On Monday morning, I was reading an article on Techcrunch about information overload. That got me to thinking about my RSS reader, and my frustration with it. Sometimes I prefer reading RSS feeds on my iPhone, but I miss being able to read it on my desktop. Sometimes I don’t hit refresh and probably miss some interesting news. Whenever I do refresh, I have to scan through every article to see if there’s anything I want to read right now.

So, then I thought of a hackathon! I had just been learning Titanium to build iPhone apps, but couldn’t think of anything useful to actually build. This seemed like the perfect idea! Two days of sketching a few things out on my trusty notepad, and today is the day to start.

Starting tonight, I will give myself a week (today until next Wednesday) to come up with the minimum viable product. It will be a groovy backed news aggregator with a symfony 2 web interface and a Titanium iPhone app accessing symfony 2 REST services… Or we’ll see where I am in a week

Wish me luck!

Posted in Groovy, Hackathon, MongoDB, Symfony, Titanium | 3 Comments

Logging

This really goes hand in hand with my last post, Monitoring.

One thing I don’t get is why in production, most logging is shut off in my favorite PHP framework, Symfony. To me, logging is necessary to determine the state of your application. Along with updating metrics for use in graphs, a log message should be logged for the same kind of events. This gives you some forensics to determine problems that occurred. Even more, you can log all sorts of information to be able to determine what went wrong where.

For instance, someone complains that their account was hacked? If you log the IP and user agent of the person who logs into an account, you’ll be able to look into your logs and determine what happened. If you’re also logging incorrect logins, you’ll also be able to see what other accounts that IP tried to log into.

Always log exceptional and nominal events that occur in your application. Your log level should be set to INFO not ERROR in production. You should be able to go through the logs and see what’s going on. It will also help you determine what new security features and bug fixes need to be planned.

Posted in Lessons learned | 1 Comment

Monitoring

What happens to your application when you’re not watching? How many people are signing up? How many are posting? Did the latest code release slow anything down? What was going on when the server crashed?!?! These are all things that server monitoring can help you cope with.

When you want to make sure that all services are working, Nagios can monitor your services and alert you when you need it. You can even add thresholds for custom metrics (there should always be someone posting in the past 15 minutes) if you can expose such metrics.

But something I like more than this is using Cacti for watching what my servers are doing. You can expose metrics via an internal feed that Cacti can pick up from. The metrics are only limited by your imagination… and the time you have to setup new graphs! Some of the things I like to monitor are:

  • New user signups
  • Major user interactions
  • In shopping carts, user checkouts
  • Any third party errors that occur
  • Time taken accessing the database or external services

This way, you can see when something is going wrong. For instance, after a major release, you’ll be able to ensure that the new user signups aren’t dropping. The graphs will show you historically the ebbs and flows of user signups. Setting up graphs for the things that are important to you will ensure that your application is working as it should.

And when things do go wrong, you can see exactly why. If the server was unresponsive, you can check the RAM, CPU, disk, MySQL connections and determine what the cause of a crash was.

Otherwise, without monitoring, you can just hope for the best and pray that it won’t happen again.

Posted in Cacti, Lessons learned, Server Admin | 1 Comment

Builds and Deployments

At my previous companies, there have been two ways of putting code onto a QA or production server. FTP the files up or svn up a checkout of the production branch.

But builds are so much cooler. You can have a build process that runs unit tests and other code analysis, removes unit tests or other things not required on the production server, and then packages this up into a tarball or zip. For this, I use a very simple phing build script. Phing is a great tool built for PHP that allows you to accomplish tasks like creating builds. A simplified version of this is as follows:

[code lang="xml"]







Tagging ${version} in svn



Exporting ${name}



Cleaning up...










Compressing

All finished. You now have ${versionName}.tar.gz

[/code]

There’s a bit missing from this, but you get the idea…

Once you have this build, it will be exactly the same no matter where you put it. There’s no worrying about which revision number you’re on, you have a simple build number that’s ready for deployment.

Then, once you have this build, you can deploy it anywhere you want. While you can use phing to deploy, my favorite deployment tool is Capistrano. There’s a great extension to Capistrano called capifony, which adds tasks to capistrano that are symfony specific. I further extend this to allow deployments to other environments than production, switches for the build version that I’ve created previously, and all applications that I create are deployable with one central capistrano build.

All in all, I think that’s so much cooler than FTP.

Posted in Lessons learned | Comments Off

Why Virtualization is cool

When developing some application or website, I cannot stress how much it helps to use virtualization. It may seem like an overcomplication, but having an exact (or nearly exact) replica of the production environment helps immensely.

For instance, when developing my mafioso game, I found a problem where on the Hackintosh that I’m developing on, Macports uses version 2.0 of MongoDB. When I deployed on my Debian 6 server, I found that the version of Mongo installed was 1.4! If I had been using a virtualization of Debian 6, I would have seen that I was developing on a version of MongoDB that was not available in my target environment.

Going back to real life experience, at my current company I created a VMWare instance of our production environment. I copied the production memcache, php, apache, freetds and other configs that we use in production onto our image. Then, you can be sure that the features that you’re using, and the errors you’re experiencing can will be the same as production. If you need to install a new package in order to use some certain functionality in PHP or likewise, you will need to note that it’s a requirement for deploying the feature being developed.

Virtualization also gives you an environment that is separated from your desktop. All the crazy things that you might do in your desktop are isolated from the environment of your virtualized server. What you see in the virtualization is what you should see in production.

Also, you can upgrade that virtualized image to a new OS to test the upgrades to your applications that will need to be performed when the upgrade happens when the upgrade happens in production. Virtualization helps you isolate changes to a specific environment change.

Lastly, virtualization is sharable. The configuration you have done on your local development environment is local to your desktop. When you include virtualization, you can just copy the image to the other developer’s desktops.

If the source code is loaded by a share or an NFS partition, then the only thing that can be different between production and development is that loading of the source code itself. Which will lead into the next topics… Deployments and builds.

One other very important part of virtualization is this. If you develop on OSX (or some other environment), then you still want to see whatever x% of people see. In this case, you install Windows XP, Vista and 7+ in a virtual machine. Load that machine up, and check out IE 6,7,8,9 plus whatever differences there are in Firefox and Chrome on Windows, and see the issues immediately. If you develop this discipline, then you will see issues that people stuck on these operating systems will see before they become an issue.

Posted in Lessons learned | 1 Comment

A series of lessons learned

In my next posts, I’ll post a series! These are the lessons I’ve learned while at my current job. The topics will include:

I’ll include a link to these posts in this post as I make them!

Posted in Lessons learned | Comments Off

Cacti Helpful Hints

When I started installing my new server, I wasn’t sure what monitoring software to use. We use Cacti at work, and I’ve used it before.. But I wanted to see what else was out there. This isn’t a review of the other software, but let me just say that I was happy to come back to Cacti. The only hard part was finding the templates… That’s where these helpful hints come in.

The first template I found was for djbdns, also known as tiny-dns. I have always loved using this bit of software. Low memory, easy config files… And now monitoring. Over on the cacti forums, Buckbeak integrated Jeremy Kister’s templates into Cacti, giving us tiny-dns graphs in cacti. Pretty sweet.

Then Glen Pitt-Pladdy added a bunch of stats to provide Postfix stats in cacti. The only issue I’ve had here is having to add some custom things to the filters to log parser to support some of the anti-spam rules that I have in Postfix.

Then, we have the misnomer or “mysql cacti templates” for this uber monitoring templates for cacti. This package contains monitoring graphs for Apache, JMX, Memcache, MongoDB, Nginx, OpenVZ, Redis and your standard UNIX templates. I haven’t gotten the OpenVZ templates to work, but this is what should be part of Cacti!

Anyways, that’s my helpful hints. Good luck in all your monitoring needs!

Posted in Cacti, Server Admin | Comments Off