Friday, June 6, 2008

Possible fix found MSA1500sc halting problem

Thanks to Eric at Genesis Hosting for finding a possible fix for our MSA1500sc and VMWare problem. He has had the same thing happen to him and has gotten further with HP and VMWare then I did for finding a fix.

His server hardware is a bit different than mine but the SAN and FC components are almost the same. Here are the various things that need doing to fix the problem:


  • Make sure the FC HBA have the latest firmware installed

  • We are both using Brocade switches, so the firmware needs to be v5.31 or higher

  • Even if you only have a single controller in the MSA1500sc, upgrade to v7.00 of the firmware

  • Most importantly, set the FC switch ports that your HBAs and MSA1500 are connected to to 2Gbps and unselect any unused port types. In my case I set the two MSA1500 connections to F-Type. I was unable to modify the type setting for the HP BL480p FC ports for some reason.



Eric's testing seems to indicate that the problem occurs if the SCSI Queue depth gets to high. You can view this by using the MSA1500 CLI and entering SHOW TASKSTATS.

You can change the FC port speed and type will VMs are running but you probably want to do it when there is no disk load on the SAN.

Eric also pointed out that instead of having to shut down all of the VMs and reboot the MSA1500 when the problem does occur you can go on to the CLI and enter DISABLE THIS_CONTROLLER REBOOT. If you have two controllers you would also have to do DISABLE OTHER_CONTROLLER REBOOT. This may hang or cause problems with the VMs but you should be able to reboot them without having to reboot the ESX hosts.

I have made all of the changes already except for the firmware on the Brocade switches. I have scheduled some downtime this weekend and will do the upgrade then as well as a reboot of everything and then I will leave it and see what happens.

2 comments:

Anonymous said...

Hi, how have you been getting on with this problem? Also what would you consider to be too high Scsi queue depth??

I have the same FC switch and MSA etc and am experancing this problem at 2 different sites at seemingly random intervals.
I am about to open support tickets with VM and HP but I won't hold my breath!
if you could email me, richard.gray at mearsgroup.co.uk with any information that would be great.
Thanks.

Unknown said...

Hi Everyone. It seems like we haven't solved the MSA1500cs problem. However, we have more information about the problem and are discussing this with HP and VMware. I created a forum dedicated to this issue at http://www.msa1500cs.com/ if you want to join in. The more feedback we can get to HP and VMware, the better, so please help if you can. Thanks! Eric