EngineSmith's Blog

Engineering Craftsman

Archive for March, 2011

V is for Victory: Vertica and VoltDB

Posted by EngineSmith on March 25, 2011

Both are Michael Stonebraker’s startup, and we recently just adopted them both. The experience so far? Amazing.

We setup a 6-node Vertica cluster in one week with data loading process as well Tableau generating reports. The simplicity and efficiency is just mind-blowing, comparing of our previous failed Hadoop based analytics project, this one is just a breeze. Of course, it has a price tag, but frankly, TOTALLY worth it! Much better than spending several engineers on it for months and still get a half-baked, super complicated and almost violent Hadoop analytics platform (Hadoop is not for the weak minded, small budgeted and resource limited startup).

  • Last week, by mistake, we loaded 10 billion rows of duplicated data into our Vertica cluster. It was still running, though a bit slow. 🙂
  • The rich analytic functions are super powerful. A path analysis (analyzing the page/click flow among all users) takes just 4 seconds over 60M rows.

If you are a startup seriously considering analytics, try Vertica before you waste all your money/resource on Hadoop/ETL/Data Warehouse solutions. They are not bad products, just too complicated. With Vertica’s powerful feature sets and linear scalability, you can simplify your data flow significantly. You will realize that start-schema is just over-rated, you can write super complicated but still blazing fast queries over de-normalized schema. SQL is just lovely (while map-reduce is just painful and awkward, one lesson we learned is: who can verify that the map-reduce code IS doing the right thing any way?).

By the way, if anybody tries to deal with big-data with some solution which can only run on one physical machine. Stop the joke.

VoltDB, it is a proper ACID SQL database on steroid. Who said you have to sacrifice consistency for scalability? We have had enough bad experiences with bad NoSQL products (see my previous posts). VoltDB is simply a god-send. Of course you have to loose something (like schema change needs restart the whole cluster – they are working to improve this, and you have write stored procedures instead of ad-hoc SQL), I think those are totally reasonable: honestly, who will die if you shutdown your site for 10 minutes a week for maintenance?

VoltDB is an in-memory only database (using k-safety and snapshot for redundancy) with linear scalability (proven). We were considering Redis for persistence for a bit (it also is in-memory with replication and snapshot), however,  Redis is swinging its directions between support clustering (transparent sharding) or disk persistence (to me a total disaster to go this direction). Since it is not settled, and manual sharding is just a big no-no, we settled on VoltDB.

We decided to pay for VoltDB support even though the community edition is perfectly enough for our purpose (we do operations through our own scripts instead of web GUI anyway). Also, really wanted to contribute to them to keep up the wonderful work.

Posted in Operations, Software | Tagged: | 2 Comments »

Who is designing your product?

Posted by EngineSmith on March 25, 2011

Ever read some document called PRD (Product Requirement Document)? In most companies (boring ones at least), it is Product Manager (PM)’s job to write up this document. Then a group of engineers rush ahead to get the product built according to the “spec”. What is the result? Most likely, the product will be late, ugly and buggy, way different than the PRD, and worst of all, nobody wants to use it.

In my personal experiences, most PMs are MBAs who have no experience in any form of product design, needless to say Web or Mobile GUI design. How do you know if they are any good, except which top business school they are from? Product is NOT business, it is art. My criteria on good product designer is: he/she has the guts to say NO. Too many time, the PRD didn’t thoroughly cover all the possible permutations of a scenario, and engineers always always like to “over-engineer” and ask the question, such as:

  • If user can delete one message, do you want to allow them to delete more than one message?
  • If user can create a new “something” here, can they delete/modify it later?

From technical perspective, they are perfectly great questions. But should a PRODUCT always do everything technically possible? Majority PM’s answer will be “good idea, let’s add it since it is useful to the user”. The best ones will answer “no, it is NOT an important feature for majority of the users, AND it made the product ugly/stupid/complicated”. These PMs are managing the soul of the product, not a SPEC. Example? iPhone original mail system, no multiple delete. Yes, there are tons of complaints, guess what, it doesn’t matter.

PM should be the soul of your product, not a executioner. It is a great sign that the founders of a startup are doing the design themselves. The moment they hire a business guy to manage a product, you know they handed their baby to a complete stranger (who might pretend he/she loves the baby, though in realty it is just a job).

Posted in Design, Startup | Leave a Comment »