EngineSmith's Blog

Engineering Craftsman

Cassandra fiasco – NoSql kills

Posted by EngineSmith on September 12, 2010

We implemented one of our product purely on top Cassandra back in March 2010 with version 0.5.1. It went live in May, and tanked in the first week due to crazy traffic. Three painful, sleepless weeks later, Cassandra finally dropped its pants and started corrupting our data. We shutdown the site, spent three full days, converted backend to MySQL clusters. Since then, our product has grown steadily until now (September 2010), with 8M+ users and 1M+ DAU (daily active user).

I don’t want to restate all the great benefits Cassandra provides, we were so excited by that. Unfortunately, if it is too good to be true, it usually is:

  • We started with a 6-node Cassandra cluster, big mistake. Some people later suggested to start with 50+ nodes. 🙂 You will see the reason why below. Yes, it may be web-scale, but not for a small startup I guess.
  • We set quorum=2, replication=3, which means 5 nodes are minimum. Unfortunately, during the initial weeks, we lost one out of six node, so our ability to stand for node failures are pretty low.
  • Cassandra has a “binary-log” kind of mechanism, it appends to that file(s) until it reaches a limit. Then it needs to compress compact that big file, during which, this node is kind of “unresponsive”. All nodes uses a protocol “Gossip” to communicate, if one node is unresponsive, all its neighbours will think it is dead. (Even without node failure, very often we see discrepancy of the topology from each node’s view).  When this compression compaction happens at high traffic, we were seeing cascading failures across the ring all the time.
  • Later people suggested to use different disk partitions for “binary-log” and data. Well, that will require at least 4 disks on a server (mirrored 2 for each, and probably you will want another pair for OS, I don’t think we are crazy enough to run on single disks), and I wouldn’t call it a “commodity hardware” any more.
  • At the time (May 2010 version 0.5.1) there is only ONE very few I/O parameter you can tune for Cassandra, it makes you doubt: hmm, I wonder why MySQL spent the last 10+ years have those many I/O tuning mechanisms?

I felt lucky we dumped it early on, at the moment people were asking us: why Digg/Facebook/Twitter all jumped on Cassandra and you guys ditched it? Maybe you are not competent enough? Turns out later the reality is:

Recently we also hear often: why don’t you use MongoDB, it does everything for you, and xxx is using it. If you pay just a little bit attention, MongoDB sacrifice your data consistency for performance, big time, almost like cheating:

Enjoy this one, “/dev/null is web scale“.

20 Responses to “Cassandra fiasco – NoSql kills”

  1. kristina said

    Note: I work on MongoDB.

    Keep in mind that the video is mocking NoSQL fanbois, not MongoDB itself. The author of the video writes: “This is getting more than a few views, so I want to make a few things clear. This is *not* a knock on Mongo DB. I use and really like Mongo DB and would encourage people to check it out as a viable option for production use. In all seriousness, it’s reallly a terrific database. I was (obviously) inspired by the guy who did the iPhone v. Evo thing and this is really just an application of that form to databases. This skit is intentionally totally over the top (language and otherwise) so for all that’s good and right in the world, *do not* take this seriously 🙂 Btw, the point that the fanboy makes about ignoring durability for performance numbers was taken from a real post somewhere.” (http://www.xtranormal.com/watch/6995033/)

  2. EngineSmith said

    We should always pick the right tool for the right job. None of the NoSql products so far can replace MySQL in write-most transaction processing. Though each of them have unique feature sets which can make doing something else easier (like map-reduce, analytics, read-most storage). Just consider how Cassandra is being used in Facebook, or Twitter. None of them has plan to replace their core MySQL with anything else. You probably know that Facebook has more than 10K MySQL nodes, and Google’s Adsense (their most valuable product) is powered by MySQL (though they don’t admit).

  3. […] This post was mentioned on Twitter by Vanessa Williams, Alex Tkachman. Alex Tkachman said: Cassandra fiasco – NoSql kills « EngineSmith's Blog http://goo.gl/Y0IH […]

  4. NoSQL is not only Cassandra. Your experience will help for sure new startups to avoid mistakes on choosing the wrong technology based. The big question could be: how use Cassandra for real in production on commodity hw?

    For most of business needs I suggest to try NoSQL solutions that supports consistency like OrientDB (http://www.orientechnologies.com).


  5. CG said

    Most of the links given above which purport to demonstrate other companies having trouble with Cassandra and MongoDB are broke. This makes it difficult to take your article seriously.

  6. Brandon said

    Note: I am a Cassandra committer.

    “We set quorum=2, replication=3, which means 5 nodes are minimum.”

    Wrong. The minimum is equal to the replication factor.

    “Cassandra has a “binary-log” kind of mechanism, it appends to that file(s) until it reaches a limit. Then it needs to compress that big file, during which, this node is “unresponsive”.”

    Also incorrect. Cassandra doesn’t do any kind of compression. Maybe you’re thinking of compaction? Nodes are not unresponsive during this process.

    “Later people suggested to use different disk partitions for “binary-log” and data. Well, that will require at least 4 disks on a server, and I wouldn’t call it a “commodity hardware” any more.”

    Using a separate disk for the commitlog makes sense, because the commitlog only does sequential IO, which gives you good throughput. By my count, log + data is two, not four.

    “There is only ONE I/O parameter you can tune for Cassandra”

    Surely you’re joking. If anything, we have too many parameters related to this. I’m not even sure which ‘one’ you could be referring to.

    • EngineSmith said

      – “We set quorum=2, replication=3, which means 5 nodes are minimum.”. My mistake, updated blog.

      – “compression”, fixed to “compaction”

      – “4 disks” – I meant mirrored. Never want to use single disks for product servers.

      – I/O tuning: I meant “disk_access_mode: auto”. Not sure if the other ones mentioned here are I/O or OS related, more like internal data structure tuning. http://wiki.apache.org/cassandra/StorageConfiguration_0.7

  7. There are some basic misunderstandings on display here; for instance, Cassandra commitlog and sstable data files are both append-only but that is the end of the resemblance. Nor is Cassandra only for 50+ machine installations. Cassandra may not have as many tunables as MySQL, but you still need to understand the system you’re using at a basic level.

    Unfortunately, Riptano wasn’t yet founded in March, but we’ve been able to help a lot of companies run Cassandra in production since then, including some starting with as few as two machines.

    /Riptano co-founder

  8. Kees Dijk said

    Nice read, thank you.

    Your Cassandra at twitter link seems to be broken, do you mean :

    • Kees Dijk said

      Actually I was to fast and didn’t look close enough. Most of your links suffer from an extra http:// . Just removing that makes everything just fine.

    • EngineSmith said

      Thanks for the reminder, I just fixed the links. Your link is great, it showed their original plan to go all the way to Cassandra then backed out, for a reason. 🙂

  9. kristina said

    > Recently we also hear often: why don’t you use MongoDB, it does
    > everything for you, and xxx is using it. If you pay just a little
    > bit attention, MongoDB sacrifice your data consistency for
    > performance, big time, almost like cheating:

    Its pretty cool when an open source project can attract the same user nuttiness that the iPhone can, but it does mean that there are some users saying that MongoDB can do everything for everyone. It can’t. However, it is a great database and shouldn’t be dismissed because it gives you the option of sacrificing safety for speed.

    The “Performance and Durability” article was written by a CouchDB employee and had nearly 100 comments about how biased and misleading it was before he deleted them all. MongoDB’s documentation tries to let users know about all of the options available for keeping their data safe, but sometimes people miss them or don’t read it.

  10. mb'Ti said

    nice blog..
    visit me..


  11. Thank you very much my friend, you are very kind in sharing this useful information with? others…. he details were such a blessing, thanks.

  12. Brilliant web site, I had not noticed enginesmith.wordpress.com previously in my searches!
    Continue the good work!

  13. Ned the Plumber said

    Out of curiosity, I checked out Cassandra code, and did a quick code review. Man, what a code mess!!! I still wonder how they can get anything running out of that major amount of spaghetti code. If you don’t believe me then download CassandraDB and look for yourself!

    Facebook guys who wrote that code may know a LOT about C++, but almost NOTHING about how to code properly object oriented systems in Java. The code is almost entirely poisoned by singletons and static fields/methods, that makes unit testing writing nearly impossible, and their RPC mechanism (that Thrift sh*t) is deeply entangled into the core logic of the database. Oh, wait, there’s more: Cassandra eats disk space like a black hole and suffers from out of memory exceptions as long as system is running continuously. In addition, Thrift, Cassandra’s RPC mecanism, also created by Facebook folks, has already fell short as it lacks features (there’s streaming already?), suffers from critical bugs, and is slowly dying of lack of commmunity support.

    The main lesson anyone will get about Cassandra is that if you write careless code you can even deliver a product/service, but you will hit the maintenance wall before long and it will hurt! In my opinion, Cassandra coders are already hitting this maintenability wall, as moving their communication “layer” (it’s quoted because there’s no really layers in Cassandra, it’s just a big yarn ball of code) from the born-dead Thrift to Avro (still in alpha!) has become a daunting, if not impossible, task. Just to add insult to injury, the Thrift sh*t is exposed as a client facing API so even if you don’t mind knowing the internals of the db you have to deal with low level calls with, say, five parameters!!! Any more or less decent software engineer book will tell you that this indicates that your code is careless written.

    I am not saying that all NoSQL systems are so badly written. You have a NoSQL system like Voldemort or MongoDB that look good both from the outside as the inside. Voldemort has a simpler data model than Cassandra, but it’s also written in Java and its code, created by LinkedIn folks, is clean, organized, and elegant. And provides a clean client-facing API. A high five to MongoDB folks too, because they clearly know what they are doing. Their foursquare downtime was more or less unavoidable as *ALL* of the NoSQL systems are still immature.

    • EngineSmith said

      Great insights, thank you! Didn’t really get a chance to review their source code. Notes to myself, stay away from Thrift as well (luckily we picked Protocol Buffers recently).

  14. Gemma Arzu said

    Very Nice website. I built mine and i was looking for some design ideas and you gave me a few. The website was developed by you?

    Thank you

  15. now and then and I’m lucky to report this newest content is really somewhat good quality and radically more beneficial than 50 % the other junk I read today

  16. Amazing story, bookmarked the blog for hopes to read more!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: