Thursday, February 3, 2011

ESXi 4.1 host not recognising existing VMFS datastore

Existing setup:

  • host1 and host2, ESX 4.0, 2 HBAs each.
  • lun1 and lun2, 2 LUNs belonging to the same RAID set (my terminology might be sketchy here).

This has been working just fine all along.

I added host3, ESXi 4.1, 2 HBAs.

If I view Configuration / Storage Adapters, I can see that both HBAs see both LUNs, but if I view Configuration / Storage, I only see 1 datastore. host1/2 can see both LUNs and I have VMs running on both too.

I have rescanned, refreshed and even rebooted, but host3 refuses to acknowledge 1 of the datastores.

Does anyone know what's going on?

Update:

I re-installed the host with ESX (not i) 4.0, same version as the existing hosts and it's still not recognising the vmfs. I think I'm going to SVmotion everything off that datastore then format it.

Update2:

I've created the LUN from scratch and the problem gets even weirder. I've presented the LUN to all 3 hosts, and I can see the LUN in the vSphere client's Configuration / Storage Adapters section on all 3 hosts.

  • If I create a datastore on the LUN via the Configuration / Storage section on host1, it works fine and I can create an empty folder via datastore browser, but the datastore is not seen by the host2 and host3.
  • I can use the Add Storage wizard on host2 and it will see the LUN. At this point the "VMFS Label" column has the label I gave with "(head)" appended.
  • If I try the Add Storage wizard's "Keep the existing signature" option, it fails with an error "Cannot change the host configuration." and a dialog box that says 'Call "HostStorageSystem.ResolveMultipleUnresolvedVmfsVolumes" for object "storageSystem-17" on vCenter Server "vcenter.company.local" failed.'
  • If I try the Add Storage wizard's "Assign a new signature" option on host2, it will complete and the VMFS label will have "snap-(hexnumber)-" prepended. At this point its also visible on host3, but not host1.

I have a similar setup in a different datacenter which didn't give me all this trouble.

  • Hi

    do you use vmfs3 for he datastores or vmfs2. vmfs2 is supported by esx4.1 but i head that you can get some issues if you try to use a vmfs2 datastore with different esx versions.

    did you try to restart the mgmt-vmware and the vmware-vpxa services on both esx 4.0 servers and rescan the datastores on each host, not only on the esx4.1?

    greetings grub

    Graeme Donaldson : They're both vmfs3. The 2 existing boxes are working fine, I'm not sure why I'd need to restart the management services?
    grub : I'm not sure either ;-) normally it shouldn't make any difference but I think your issue is so strange that it couldn't harm to restart the services.
    Graeme Donaldson : I'd rather not restart the management services on the 2 that are working fine. I've added an update to my question, going to clear everything off the datastore and format it.
    From grub
  • What SAN box is it? most allow you to define what OS a given host is and picking the wrong one (i.e. Windows instead of VMWare/Linux) will cause odd behavior, some allow you to do this on a per-LUN basis too - I'd check your SAN host definitions then. The other thing you could do is create a third LUN, map it to all 3 hosts, partition/format it from the new server and rescan - what happens then?

    Graeme Donaldson : You've probably never heard of it.... it's an Axus YA-16SAEF4. It's not supported by VMware so I'm left to my own devices. There's no available space to create another LUN, I'm going to SVmotion everything off the datastore and start from scratch in the hope that sorts it out. Edit: Oh, and there doesn't seem to be any configuration setting to specify what OS is used by the host/s. I can only edit the alias for each HBA seen by the SAN box.
    Chopper3 : Hmm...odd, in a strange way I do hope this is just a datastore corruption as your course of action should work - what worries me is if there's an issue with your HBA firmware not working with your array it'll be a lot harder to fix if at all.
    Graeme Donaldson : You've given me some more food for thought. In the back of my mind I thought these were identical boxes, but it turns out they aren't. The 2 working boxes are IBM x3550 M3 boxes, the one I'm adding is an x3550 M2. It also has a different model HBA to the 2 working boxes.
    Chopper3 : What triggered this is that we have a particular platform with both HDS and 3PAR SAN arrays and we've had to standardise on a particular blade HBA and a particular firmware version on them as it was the only version that worked fine for BOTH systems - it's a pain as we're always adding new servers but we ended up just buying a load of the right HBAs, flashing them as needed and that way we just bang these pre-built ones into new servers and at least it's done then. Fingers-crossed for you on this one.
    Graeme Donaldson : I have an identical setup in another datacenter, also 1x M2 box, 2x M3, same SAN, same HBAs. I'm going to compare firmware versions on everything I can and see if there are differences.
    pauska : You should be aware that unsupported (and rare) storage arrays can provide many weird errors. I had a Promise vTrak I tried to get working with vmware, got all sorts of weird behaviour so I screwed it.
    Graeme Donaldson : I am aware of that. It really bugs me that I have another setup in another datacenter just like it that works fine. Nonetheless, its possible I may have to tell the boss that it needs to be thrown out.
    Chopper3 : @Graeme - just read your update2 - oh mate I'm out of options, you could call VMWare's support but there's a reasonable chance they'll tell you to sod off because of the Axus - do you have any other supported array that you could link to your hosts to prove that they're ok? even a small single-shelf FC box would do - you need to find out if it's a host/HBA/cabling/switch/array isssue I guess. Let us know how you get on ok? If you'd been closer I could have leant you a spare EVA4400 I've got collecting dust - sorry.
    Graeme Donaldson : It's all sorted now. I have all 3 hosts using 3 LUNs on an IBM SAN, a DS3400 if I'm not mistaken. That was working fine all along FWIW.
    Chopper3 : Glad to hear it, are you thinking it's the actual array then? if that's the case it must be a firmware issue with it working for some boxes but not others - odd but glad you're sorted.
    Graeme Donaldson : If it weren't for the fact that there's another LUN on that SAN that worked just fine when I originally installed host3, I would be inclined to blame the array. This is one of those weird issues that makes no sense on any level. With a little luck I'll never see anything like it again. :-)
    From Chopper3
  • I've got it sorted now.

    Based on information found in this thread I used the vSphere client to connect directly to host1 and then created the store. I then connected directly to host2/3 and added the datastore, selecting the "Keep the existing signature" option.

    It's now usable on all 3 hosts.

    To be honest I'm still a little annoyed with the whole situation, I'm not a big fan of voodoo solutions, but so be it.

0 comments:

Post a Comment