EngineSmith's Blog

Engineering Craftsman

SSD faults finally resolved

Posted by EngineSmith on August 28, 2009

We finally got our Production environment stabilized with MySQL, Open Solaris, ZFS, SSD/JBOD. The performance is just super awesome! With sysbench, we could reach 21K QPS stable! It took us quite some time to get it done, especially we faced a nasty SSD fault issues.

We were inspired by this SmugBlog: Success with OpenSolaris + ZFS + MySQL in production! and ordered JBOD with 2 Intel X-25E 32GB SSD and 10 SAS 15K drives for our MySQL se rver. The SSDs are configured as ZFS Cache (write cache). The SAS controller is LSI 3801E.

After some load testing, we found that there are many “scsci bus reset” errors in the log and “iostat -E” show “Hard Errors” for the SSD drives. Whenever “scsi bus reset” error happens, everything comes to a halt on MySQL side, thus we got very in-consistent load testing results. The “reset” happens every 30 minutes or 1 hour. It was just 3 days before our launch date!

We upgraded OpenSolaris to 2009.06, LSI and SSD firmware, didn’t help. Finally our mighty consultant found that turn off OpenSolaris Fault Management can ease off the errors. It took us through the first couple weeks’ Production launch period. Eventually our vendor Silicon Mechanics helped out and we updated the LSI driver. Wowla, not a single SSD error after that, performance is simply amazing!

Fault Management

  • fmadm unload disk-transport   # turn it off
  • fmadm -eV      # show status

Sysbench Load Testing

  • sysbench –test=oltp –oltp-table-size=100000 –mysql-db=test –mysql-user=xxxx –mysql-host=host –mysql-password=xxxx prepare
  • sysbench –test=oltp –oltp-table-size=100000 –mysql-db=test –mysql-user=xxxx –mysql-host=host –mysql-password=xxxx –max-time=60 –oltp-read-only=on –max-requests=0 –num-threads=8 run

Update LSI Driver

root@opensolaris:~# wget
root@opensolaris:~# uncompress itmpt-x86-XXX.tar.Z
root@opensolaris:~# tar -xvf itmpt-x86-XXX.tar
root@opensolaris:~# cd install
root@opensolaris:~# pkgadd -d .
root@opensolaris:~# vim /etc/driver_aliases (Or the editor of your choice)
Here you will need to comment out the mpt driver aliases and uncomment
the itmpt driver at the end of the file.
#mpt "pci1000,30"
#mpt "pci1000,50"
#mpt "pci1000,54"
#mpt "pci1000,56"
#mpt "pci1000,58"
#mpt "pci1000,62"
#mpt "pciex1000,56"
#mpt "pciex1000,58"
#mpt "pciex1000,62"

itmpt "pci1000,30"
itmpt "pci1000,50"
itmpt "pci1000,54"
itmpt "pci1000,56"
itmpt "pci1000,58"
itmpt "pci1000,62"
itmpt "pci1000,621"
itmpt "pci1000,622"
itmpt "pci1000,624"
itmpt "pci1000,626"
itmpt "pci1000,628"
itmpt "pci1000,640"
itmpt "pci1000,642"
itmpt "pci1000,646"

Reboot OS after this

7 Responses to “SSD faults finally resolved”

  1. Amongst other things, I work on the mpt(7d) driver at Sun. I am very curious as to whether you logged a call with Sun about the issues you came across. I am very keen to find out what sort of performance deltas you might get by using a newer build of OpenSolaris (we’re currently in the middle of build 124) since both mpt(7d) and ZFS have been considerably enhanced with regard to performance.

    • James, I’m experiencing the same problem with 10u8 – I have a Solaris support contract and would love to work on it with you, but I can’t seem to find your contact info. You can reach me via the “contact” form on my blog.

  2. EngineSmith said

    We didn’t file an issue with Sun because we didn’t suspect OpenSolaris in the first place, neither did we purchase any support from Sun. The issue is really implicit and hard to trace, plus we don’t have spare hardware to simulate the test. Silicon Mechanics really did an awesome job helping us out.

    After this issue is resolved, we can maintain 21K QPS with sysbench MySQL read/write test for 12 hours (with MySQL master-master replication on both servers).

    Thanks for the heads up, we will definitely compare with the latest OpenSolaris when we get a chance.

  3. Excellent site, keep up the good work

  4. I’m so glad I found this site…Keep up the good work I read a lot of blogs on a daily basis and for the most part, people lack substance but, I just wanted to make a quick comment to say GREAT blog. Thanks, 🙂

    A definite great read.. <a href="http://wiki.hudson-ci.org/display/~bill-bartmann&quot;


  5. Really nice posts. I will be checking back here regularly.

  6. Patrick Walther said

    This article absolutely saved my day.

    Apparently I ran into the same issue with Intel 12x 510 SSDs attached to a LSI 9211-8i.
    The served NFS-Datastore constantly got stuck for about 20sec while iostat -xncz showed one SSD-Disk at 100% busy without any i/o.

    On Solaris 11 1111 it seems that there is no alternative driver at the moment.

    Thanks a lot!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: