EngineSmith's Blog

Engineering Craftsman

Archive for the ‘Software’ Category

Clipper Card: nightware product design

Posted by EngineSmith on April 4, 2011

Clipper Card is a new San Francisco bay area transportation payment system intended for all main public transportation systems, including Caltrain, BART (subway), VTA (buses) etc.  Well, it was a complete disaster so far and I believe it has marked the death of public transportation development in the heart of Silicon Valley (quite ironic, right?). The whole product is simply a joke, my experience today can give you something fun to enjoy.

I ride Caltrain daily and buy monthly pass on my Clipper Card. The rules are:

  • You have to tag on and off once ONLY on the first day of month when you travel in order to activate your monthly pass. No idea why this is necessary, maybe they just want to save one cron job (technical term for scheduled job). If you don’t do this, Caltrain conductor will read that you do NOT have a valid ticket and will either kick you off the train or give you a citation ($250 minimum), even though I think they can see clearly on their device that you have an “inactive” monthly pass on your card.
  • If you tag on in source station and forget to tag off in destination station, Clipper Card will charge you the maximum fare possible from your originating station. Basically they think you are most likely cheating in such cases, and should pay the penalty (“you are assumed guilty”).

Okay, if you are still reading, I take you are well educated enough to understand those rules, as I was until this morning.

  • Luckily without a reminder, I remembered to tag on at 10:30 AM on 4/4/2011 (since I didn’t go to work on 4/1/2011). By the way, the card reader message showed absolutely nothing about my monthly pass, it acted as if it just deducted the maximum fare of $8.50 from my card with a remaining balance displayed. The theory is that when I tag off, they will refund my $8.50, and show a very vague “Pass OKAY” message indicating the monthly pass has been activated.
  • Sadly, I forgot to tag 0ff (how can I remember to do it once in 30 days). By the time I remembered, it was already 3 PM. Thinking there might be a time window allowed to travel (some say it is 6 hours), I rushed back to the station and tagged.
  • The super intelligent machine showed something meaning “you just opened another trip, $9.00 has been deducted from your card”.
  • “WTF!” I was stunned there, looking around, nobody can help (and I dare not to tag the card again trying to cancel this trip like many of you might be thinking). 🙂

Went back to my office, checked on the online access, right, they charged me $8.50 in the morning already, and now have a new trip of $9 charged. Now the only hope is to call their customer support line to talk to a human. Turns out the time window allowed for travel is only 4 hours, and that’s why they charged me (assuming I am cheating, even though I have a monthly pass on the card). “Congratulations though, your monthly pass has been activated.” – that is the exactly the words from the customer support guy.

I had to call them back tomorrow to get both charges refunded, as an one time courtesy, saying the support. I know, they think I am really stupid to “forget” to do such a simple thing once a month.

Frankly, this is the most retarded software system I have ever seen. With the help of Clipper Card, the bay area already terribly in deficit public transportation systems may die much quicker. As a consumer, I don’t really care what kind of complicated problems they are trying to solve, if it makes things harder and messier, it is a fail. From design perspective, what went wrong?

  • The purpose of the system is to simplify people’s life. You can’t push the burden to ordinary users to “remember” and “apply” your complicated business flow (if …. else … then ….) just because you are lazy to make it simpler. I have a monthly pass, thus making me remember making an exception once per month is totally un-acceptable.
  • You can’t assume everyone is cheating, they are your valued customers and they are human. Human forgets and makes mistakes all the time. If in your system you know I have a monthly pass, why you still charge me? Guess what, everyone will call your service department for a refund. Do you know how much it costs to take one call? My guess is around $50. Also, those calls won’t improve your service ratings, since it is merely remedying the stupid design flaws in your system in the first place.
  • Customer Support should NEVER the only way to solve a problem. Surprisingly, you can only charge your card in Walgreens store, or online, but you can’t find out what’s going on through them. There is not a single device out there can help you to manage your card (like ATM). If you make a mistake, in any form, you have to call, or risk a citation (which you may have to do, since next train maybe an hour away).

By the way, just remembered another funny experience in BART ticket system many years ago (not sure if it is still the case today): I inserted a $20 bill into the ticket machine, instead of directly asking me where I want to go, or how many tickets I want, it gave me a list of options to select: do you want to buy: 5 tickets of $4? or 4 tickets of $4.5? etc. Wow, very intelligent machine, I have to be good at math to understand what you meant. However, why don’t you listen to what I want instead?

Posted in Engineering, Software | Tagged: , , , , , | 2 Comments »

V is for Victory: Vertica and VoltDB

Posted by EngineSmith on March 25, 2011

Both are Michael Stonebraker’s startup, and we recently just adopted them both. The experience so far? Amazing.

We setup a 6-node Vertica cluster in one week with data loading process as well Tableau generating reports. The simplicity and efficiency is just mind-blowing, comparing of our previous failed Hadoop based analytics project, this one is just a breeze. Of course, it has a price tag, but frankly, TOTALLY worth it! Much better than spending several engineers on it for months and still get a half-baked, super complicated and almost violent Hadoop analytics platform (Hadoop is not for the weak minded, small budgeted and resource limited startup).

  • Last week, by mistake, we loaded 10 billion rows of duplicated data into our Vertica cluster. It was still running, though a bit slow. 🙂
  • The rich analytic functions are super powerful. A path analysis (analyzing the page/click flow among all users) takes just 4 seconds over 60M rows.

If you are a startup seriously considering analytics, try Vertica before you waste all your money/resource on Hadoop/ETL/Data Warehouse solutions. They are not bad products, just too complicated. With Vertica’s powerful feature sets and linear scalability, you can simplify your data flow significantly. You will realize that start-schema is just over-rated, you can write super complicated but still blazing fast queries over de-normalized schema. SQL is just lovely (while map-reduce is just painful and awkward, one lesson we learned is: who can verify that the map-reduce code IS doing the right thing any way?).

By the way, if anybody tries to deal with big-data with some solution which can only run on one physical machine. Stop the joke.

VoltDB, it is a proper ACID SQL database on steroid. Who said you have to sacrifice consistency for scalability? We have had enough bad experiences with bad NoSQL products (see my previous posts). VoltDB is simply a god-send. Of course you have to loose something (like schema change needs restart the whole cluster – they are working to improve this, and you have write stored procedures instead of ad-hoc SQL), I think those are totally reasonable: honestly, who will die if you shutdown your site for 10 minutes a week for maintenance?

VoltDB is an in-memory only database (using k-safety and snapshot for redundancy) with linear scalability (proven). We were considering Redis for persistence for a bit (it also is in-memory with replication and snapshot), however,  Redis is swinging its directions between support clustering (transparent sharding) or disk persistence (to me a total disaster to go this direction). Since it is not settled, and manual sharding is just a big no-no, we settled on VoltDB.

We decided to pay for VoltDB support even though the community edition is perfectly enough for our purpose (we do operations through our own scripts instead of web GUI anyway). Also, really wanted to contribute to them to keep up the wonderful work.

Posted in Operations, Software | Tagged: | 2 Comments »

Your car has more than 10M lines of code

Posted by EngineSmith on February 14, 2011

Luxury car has more than 100M lines of code, while F-22 Raptor has only 1.7M lines of code. Mercedes-Benz S-class only the navigation system contains more than 20M lines of code.

From somewhere else I found that average car from Ford contains more than 10M lines of code. I think we are doomed with this trend. Everybody seems like to reinvent the wheel again and again, no matter how lame they are with it.

Same thing happening in the web world, every several months, there are some new kids on the block trying to solve an old problem with a completely new approach. “You don’t throw away the baby with the bath water”, there is some quote like from from Drizzle project (the re-vamp of MySQL project), if I remember correctly. So many NoSQL products are trying to replace the existing rock solid MySQL, backed by remarkable marketing hype (and propaganda campaign machines). Only time will tell, just like how MySQL survived so many years.

[update] A colleague found this link, amazing: F22 got zapped by International Date Line.

Posted in Engineering, Software | Tagged: , , , , | Leave a Comment »

MySQL JDBC Multiple Query Trap

Posted by EngineSmith on January 20, 2011

This is a pretty big lesson recently. We are using MySQL JDBC driver against 5.1 Percona build. For a simple use case, we decided to use transaction (though we know it doesn’t scale well). Though stored procedure can be used, we chose to use a trick in JDBC driver: allowMultipleQueries. http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html

Basically, the following statements are put together as one iBatis query: begin; insert into A …; update B where ….; end;

This thing worked perfectly fine for several months, until one day, after new release, suddenly our MySQL started having trouble, it began to show errors of deadlocks. Tons of random queries are grouped together into big transactions and causing deadlocks. We scratched our head for a long time since absolutely nothing has changed in that release related to this transaction logic, neither did we touch MySQL or change JDBC driver. Seems the transaction boundary was extended randomly outside of the those two above queries, and since we use connection pooling (BoneCP), consequent queries on the same connection were combined into big transactions. This is really terrible!

Eventually we took out this allowMultipleQueries trick and did everything in plain two step SQLs (yeah, no transactions). Until today, we still don’t know exactly what triggered the problem since it was working fine for several months.

By the way, a side note about transactions. It sounded like a perfect solution on paper, in reality, especially web world (where you have sharding, many database nodes), it doesn’t work. One perspective to look at is: modern hardware/network is quite reliable nowadays, comparing to the cost to ensure transaction, probably it is better to spend the time/money on ways to fix things (tool, customer support, reconciliation) if a transaction was interrupted. It will also make your system much simpler and easier to scale.

Forgot to mention, in 2004, Paypal had an outage for almost a week due to some smart guys introduced two-phase commit into their Production system. Great idea to guarantee ACID, also a text book example use case (banking transaction). Sadly, in reality, it doesn’t work.

Posted in Engineering, Operations, Software | 3 Comments »

Redis rocks!

Posted by EngineSmith on October 28, 2010

Several weeks ago I wrote Redis – Cache done right. Since then, we deployed a cluster of 6 Redis nodes in Production. The result: simply awesome!

It was a smooth ride, billions of cache hits every day, very low CPU load. The application logic is super simple, no fancy locking etc required since list/set are native data structure with atomic operations. We implemented our own distributed hashtable algorithm (consistent hashing) to distribute keys into the cluster (borrowed from memcached’s Java client). Our MySQL database load has been dropped a lot, as well as we now caches lots of Facebook call results (since their API sucks, high failure rate, and average 2-10 seconds response time).

By the way, we didn’t use snapshot, append-only log etc yet. For now it is just a read-only cache. Later on we will use those fancy stuff as well.

The only issue we ran into was: we forgot to set maxmemory, Redis happily hit the physical as well as swap limit, and crashed the whole server. 🙂 Since CPU load is low, we actually now runs 3+ instances on each physical machine to form 3 individual clusters.

To sum it up, love this guy’s twitter.

Posted in Engineering, Operations, Software | Tagged: | 4 Comments »

redis – cache done right

Posted by EngineSmith on September 11, 2010

Finally we got into the situation to consider a cache system (our system is write-most, thus cache was not originally the main concern). After some research, we settled on redis instead of memcached. Here are many things redis has done right:

  • atomic operations – no more complicated locking logic in application code
  • support list, set etc more data structures – simpler application code
  • control your cache’s TTL – guaranteed persistence, virtual memory, snapshot (crash recovery)
  • fast, slim and simple – do one thing very well. The source code only relies on make and gcc, no other dependencies

There is no clustering support in redis itself (or its Java client lib yet), however, it is not hard to write your own consistent hashing algorithm to use several redis instances together. And I really hope redis won’t complicate itself in that direction (or maybe make a separate library for it, just like CouchDB uses Lounge. To me, “do one thing and do it really well” is the best way to go).

Another candidate we considered was membase. It claimed to have done everything (clustering, re-balancing, persistence), which is actually a bit scary. Just like our Cassandra fiasco in May 2010, well, I will write up that lesson a bit later.

We are going to roll redis in production next week, will keep you posted about how it goes.

Posted in Operations, Software | Tagged: | 3 Comments »

Sorting, basics for a programmer?

Posted by EngineSmith on September 11, 2010

One of my favorite interview question is to write the simplest function to sort the characters in a string, no gimmicks, no tricks, just to see if the code is clean, compilable and easy to follow. i.e. a simple bubble sort will do, simple? Don’t laugh at it yet.

Surprisingly, there is a 80%+ failure rate (wrong algorithm, major errors in the code etc.) in the last 5 years within 100+ candidates, especially some very senior, architect level guys. Within the ones who finished it correctly, I probably hired most of them (of course, they have to pass other stuff too).

Several interesting observations:

  • Many senior guys got offended if they can’t get it right. The attitude shown is much more important than the question itself.
  • Most US educated engineers will use “merge sort” first (to be honest, the concept is simple, but the code is long, and many failed at merging two arrays of characters or recursions). Foreign engineers usually pick bubble sort, super simple, you can finish it in less than 10 lines.
  • So far only one guy wrote “quick sort” in front of me. He got it completely right, and can also briefly tell me why it is efficient. I think he is just well prepared, but still showed his efforts.
  • For some, sorting is such an alien concept. I wonder if I pick a guy from the street without any computer science education, he at least may be able to describe how he can sort a bunch of stuff. Many engineers couldn’t even start the thought process.

Here is an interesting thing about sorting, each algorithm has its own unique sounds!

Posted in Engineering, Software | Tagged: | 1 Comment »

Grails – The Savor of Java Web Applications

Posted by EngineSmith on August 28, 2009

Let’s face it, developing web application in Java/JSP is really really painful, regardless which framework you work with. You got to restart your application server very often, for some simple Java classes or layout/configuration changes. But we are kind of stuck with it, since most of our domain logic are already written in Java, using PHP or Rails will require us to either rewrite domain logic, or expose them as HTTP services. Both sounds a bit crazy unless you got a web guru in house.

Finally Grails comes to the rescue, there are couple things which are just so amazing about it, you get the best from both world: the strictness of compiling language as Java for domain logic, the dynamics and convenience of scripting language Groovy. It is truly a god send:

  • The only thing deployed to Production is a web application (WAR file) which contains Java classes only. Everything runs in compiled Java class form. You do NOT need Grails or Groovy environment in Production. You got all the benefits of mature JVM: multi-threading, heap management, garbage collection and great scalability
  • Most things are dynamic loaded at Development time. So you don’t need to restart server all the time
  • Seamless integration with existing Java domain logic, Spring is native. Intellij IDEA contributed the mighty compiler which compiles Groovy file and Java file at the same time, and, you can debug Groovy class inside IDEA

Furthermore, Grails has tons of convenience stuff to make developers life super easier (like Rails), for example, they have a wonderful example which takes 10 lines of code to add full text search ability for a database entity by using Lucene. Those don’t contribute to my decision, but icings on the cake are just so sweet!

Unfortunately, Grails has too many code-by-convention stuff which makes it hard to be wrapped into existing big Java projects. There are some tricks and hacks we did to make it work, I will cover them in future blogs.

Posted in Engineering, Software | Tagged: | 2 Comments »