V is for Victory: Vertica and VoltDB
Posted by EngineSmith on March 25, 2011
Both are Michael Stonebraker’s startup, and we recently just adopted them both. The experience so far? Amazing.
We setup a 6-node Vertica cluster in one week with data loading process as well Tableau generating reports. The simplicity and efficiency is just mind-blowing, comparing of our previous failed Hadoop based analytics project, this one is just a breeze. Of course, it has a price tag, but frankly, TOTALLY worth it! Much better than spending several engineers on it for months and still get a half-baked, super complicated and almost violent Hadoop analytics platform (Hadoop is not for the weak minded, small budgeted and resource limited startup).
- Last week, by mistake, we loaded 10 billion rows of duplicated data into our Vertica cluster. It was still running, though a bit slow. 🙂
- The rich analytic functions are super powerful. A path analysis (analyzing the page/click flow among all users) takes just 4 seconds over 60M rows.
If you are a startup seriously considering analytics, try Vertica before you waste all your money/resource on Hadoop/ETL/Data Warehouse solutions. They are not bad products, just too complicated. With Vertica’s powerful feature sets and linear scalability, you can simplify your data flow significantly. You will realize that start-schema is just over-rated, you can write super complicated but still blazing fast queries over de-normalized schema. SQL is just lovely (while map-reduce is just painful and awkward, one lesson we learned is: who can verify that the map-reduce code IS doing the right thing any way?).
By the way, if anybody tries to deal with big-data with some solution which can only run on one physical machine. Stop the joke.
VoltDB, it is a proper ACID SQL database on steroid. Who said you have to sacrifice consistency for scalability? We have had enough bad experiences with bad NoSQL products (see my previous posts). VoltDB is simply a god-send. Of course you have to loose something (like schema change needs restart the whole cluster – they are working to improve this, and you have write stored procedures instead of ad-hoc SQL), I think those are totally reasonable: honestly, who will die if you shutdown your site for 10 minutes a week for maintenance?
VoltDB is an in-memory only database (using k-safety and snapshot for redundancy) with linear scalability (proven). We were considering Redis for persistence for a bit (it also is in-memory with replication and snapshot), however, Redis is swinging its directions between support clustering (transparent sharding) or disk persistence (to me a total disaster to go this direction). Since it is not settled, and manual sharding is just a big no-no, we settled on VoltDB.
We decided to pay for VoltDB support even though the community edition is perfectly enough for our purpose (we do operations through our own scripts instead of web GUI anyway). Also, really wanted to contribute to them to keep up the wonderful work.