Wednesday, January 26, 2011

Fix things first or virtualize and then fix

Warning: This is a very general question.

I've walked into an environment where everything is working for the most part but it is held together with scotch tape (don't want to offend the duct tape cult).

Key points

  • Backups can't restore to different hardware
  • SAN uses Microsoft iSCSI initiator
  • Permissions are a non-documented nightmare
  • Many single points of failure
  • Servers are mostly 4-5+ years old Poor utilization
  • 20 servers across 3 offices
  • All windows servers

Should I virtualize first (already have a basic SAN) or address these issue prior to virtualizing? I think virtualizing will make fixing these issues much easier but I want to avoid garbage in garbage out. My biggest concern are the backups with Exchange and SQL servers (with middleware) not being able to restore to different hardware. I'm planning to go VMWare when the time comes.

Your thoughts.... Thanks,

  • Well, as with nearly all things, it depends. The one big win you get with virtualizing things in your situation is the ability to snapshot VMs. As you're troubleshooting/patching/fixing/etc, the ability to snap could be a godsend. Conversely, the P2V transition could throw another level of instability/unpredictability into the mix. You can always give P2V a try and if things don't work out well, you haven't really lost anything - you can always go back to the physical host.

    Chris S : +1, It depends, but probably virtualize first and work on backup at the same time (though virt may solve that).
    From ErikA
  • Personally I would look into fixing things up first, then look into improvements in the infrastructure. This way you are not introducing new complexities to the problems.

    Let me take a little bit to address the issues you brought up:

    • Backups can't restore to different hardware

    This is a MAJOR issue. You should really talk to your backup vendor and figure out why. Is it because they are doing backups that restore to bare metal and the vendor doesn't support bare metal restores unless it is the same hardware? If so you should be able to add a data only backup to the rotation. That way it may be a bit more work to come back up, but you don't lose the important stuff (the data)!

    • SAN uses Microsoft iSCSI initiator

    Why do you think this is a problem? There is nothing wrong with the microsoft iSCSI initiator, in fact I would be wary of someone who didn't use that on MS platforms. We have hundreds of boxes using the iSCSI initiator to talk to dozens of SANs without issue.

    • Permissions are a non-documented nightmare

    This ... sucks. And, happens everywhere. You best bet is to slowly chip away at documenting these. Search on this site there are a bunch of questions related to documenting permissions using scripts. But you don't want to go messing around with things before you know how they are right now.

    • Many single points of failure

    This is always a tricky one. You need buy in from the business to get them to spend the money to reduce or eleminate the SPoF. My best suggestion to you is document everything, and put together a risk analysis. Then put together a few suggested solutions and approximate costs and present it to the business owners. If they want to reduce or eliminate them then you are golden, if not all you can do is keep documenting it, and start documenting outages caused by it and bring it back up to the owners.

    • Servers are mostly 4-5+ years old Poor utilization

    There is nothing wrong with this as long as they are still under warranty. If the 4-5 year older servers are under utilized they are good candidates to be virtualized, but you should spend some time doing performance analysis to see where the utilization is - Memory, Network IO, Disk IO, processor, etc - so you can properly plan your Virtulization strategy.

    • 20 servers across 3 offices

    Once again, nothing wrong with this either. You just need to make sure that there are proper remote tools at your disposal - IP KVMs, Remote access power strips, iLO/DRAC cards,etc. In fact depending on the WAN connection, centralizing could reduce performance and manageability. Once again take a look at your use profiles for the servers.

    • All windows servers

    Absolutely nothing wrong with this, changing things because they are windows for the sake of changing them away from windows is a bad bad idea.

    So, if I was in your situation I would sit down and make a list of everything you see as needing to be changed, then organize them as most important (i.e. Data loss, Downtime) to least important (i.e. inconvenience, infrastructure improvements). Then you just work down the list fixing things one at a time until it's done.

    Virtualization is not a panacea, it may solve some of your problems, but it will introduce new issues and problems along the way. I would think long and hard before jumping in to virtualizing things without a good solid understanding of how things are now as well as how it will change the situation and what new issues it could introduce.

    Zoredache : +1 backup needs to be fixed ASAP, but most other issues tend to be very relative to your network. For a smaller environment you eliminating all SPoF is likely impossible.
    joeqwerty : Personally I don't consider the lack of the bare metal restore to different hardware capability of the backup software to be a deficiency. Sure, many backup programs have this ability but many don't as well. The lack of it doesn't make a particular backup program "bad" in my opinion. Backup software is purpose driven, to perform backups and restores. System imaging\restoration software is purpose driven, for system imaging and restoration. The inclusion of this ability in your backup software is a plus but the lack of it isn't a minus in my book.
    Zypher : @Joeqwerty: I read it as they had no other backups in place to be able to get to the data if they could not get their hands on the same hardware. I wasn't saying the Solution was the problem, more so the fact that from what I was reading they would have no other way to get at their data. So tl;dr i wasn't making a comment as to the software but the lack of - admittedly assumed - ability to get to the data any other way.
    joeqwerty : @Zypher: Gotcha. Thanks for the clarification. Also, +1 for a well thought out and well stated answer.
    From Zypher
  • I would probably fix things while virtualising. Like having 2 simultaneous infrastructures, the "new" one and the "old" one, and migrating things one by one.

    From coredump
  • "If all you have is a hammer, everything looks like a nail."

    Reevaluate why you'd want to virtualize the infrastructure. What problems are you encountering that would warrant virtualization? I personally would not virtualize the infrastructure unless you really really need the extra hardware that would be freed up for other things. You'd be introducing another issue as well: what if the hardware hosting the hypervisor dies? You might say, "I'll just use HA across two machines with VM's on them!" But what does that buy you over, say, building highly available services with services simply installed on top of a regular Windows installation?

0 comments:

Post a Comment