WindowsNinja

The musings and mishaps of a Windows sys admin

latest news

IT – Cost Center or Insurance Center?

A reddit poster, /u/asdlkf, offered a very interesting perspective on the current state of IT yesterday. He (or she) hypothesized that the days of IT  being purely a cost center are in the past. These days companies need to view their IT staff and infrastructure as an insurance center, a piece of the company which offers varying levels of assurance and protection against outages depending on the amount of money they are willing to invest.
 
This really got me thinking about things and how absolutely correct this person was about how companies should approach IT. From the perspective of a company executive it is very important that they not simply look at IT a necessary evil costing the company precious dollars. Instead, they need to view us as an asset which provides assurances on the company’s most important assets. When determining budgets for their IT expenditures, it is not enough to simply calculate a few salaries with minimal support in terms of hardware and software licensing, expecting them to make miracles. These executives really need to consider how vital their core services are.
 
Consider this: How much would it cost your company if there was an 8-hour outage for your company e-commerce website? Once you’ve determined how much something like this can potentially cost your company, you next need to determine how much you’re willing to invest to prevent this from occurring.
 
Let’s use this as a hypothetical example:

  • E-Commerce Site: Outage costs $100,000/hour
  • IT Insurance Solutions:
    • Scenario A: Cost to implement solution with no redundancy – $50,000/year
      • Chance of 1-hour downtime per month – 70%
      • Chance of 3-hours downtime per month – 20%
    • Scenario B: Cost to implement solution with n+1 redundancy (all at the same site) – $250,000/year
      • Chance of 1-hour downtime per month – 10%
      • Chance of 3-hours downtime per month – 3%
    • Scenario C: Cost to implementation solution with n+1 redundancy onsite and an additional n+1 solution in a geographically diverse area – $500,00/year
      • Chance of 1-hour downtime per month – 1%
      • Chance of 3-hours downtime per month – .1%

The math above is extremely simplified but this will allow us to show how IT solutions can be presented in terms of an insurance center. A company’s finance/risk team can then perform some calculations to determine how much money to invest in IT to  determine the proper amount of risk that should be taken.
 
For example, when looking at the potential for a 3-hour outage you could use the math below:

  • Scenario A - Cost to reduce likelihood of 3-hour outage to 20% is $0 (as $50,000 is the minimum to provide support for the solution and is therefore not additional cost)
    • $100,000/hour * 3 hours = $300,000 * 20% likelihood = $60,000/month average risk
    • Annual Expected Cost of Downtime – $60,000/month * 12 months = $720,000
  • Scenario B - For a cost of $200,000 ($250k minus the $50k mandatory cost) you can reduce the likelihood of a 3-hour outage per month from 20% to 3%.
    • $300,000 * 3% likelihood = $9,000/month average risk
    • Annual Expected Cost of Downtime - $9,000/month * 12 months = $108,ooo + $200,000 (upfront cost) = $308,000
  • Scenario C - For a cost of $450,000 you can reduce the likelihood of a outage from 20% to .1%
    • $300,000 * .1% likelihood = $300/month average risk
    • Annual Expected Cost of Downtime - $300/month * 12 months = $3,600 + $450,000 = $453,600

 
Using the math above you can perhaps conclude that the best combination of cost and risk is Scenario B (which would expect to cost $308,000/year vs. $720,000 for Scenario A and $453,600 for Scenario C). Obviously, this is highly simplified and does not include all the necessary calculations for a real scenario (such as what is the risk of 10 hours downtime per month?). However, this gives you a good starting point to begin working with your management team on the importance of viewing IT not simply as a cost center but as an insurance center which offers varying levels of coverage based on their investment.
 
 
It is incredibly important for us as engineers to look at things not only from a technical perspective but also a business perspective. When communicating with the business about these different risks, we need to put it in terms they understand. If we fail to do this and we’re not given the support we desire then the failure is on us and not them. However, if you do provide them with this information then the burden is then shifted upon the business as it is their own assessment of the potential risks that made this decision.

 
This is one of the most challenging things to do as a technical person, to step outside our own world and work within the realm of another. However, this is the difference between a good engineer and a great one. A good one can build you a system that will give you 5 9′s reliance. However, a great engineer can design a system with 5 9′s reliance and then sell this system to the decision makers.

 

Always strive to be great.

I Accidentally Increased the Disk Size of a Hyper-V 2008 VHD with Snapshots Present…now what?

I just managed to get myself out of a sticky situation caused by some hair-brained execution on my behalf. At $work we’re at the tail end of a migration from Exchange 2010 On-Premise to Office 365. As part of this migration, Microsoft just upgraded us from Exchange 2010 to Exchange 2013 on their systems which has caused some compatibility issues for us as we’re currently operating in a hybrid configuration where we still need to be able to manage a number of things from our On-Premise Exchange Management Console.
 
The only problem is our On-Premise configuration is on an outdated version of Exchange 2010 and only Exchange 2010 SP3 is fully compatible with Exchange 2013 on Office 365. What this means is that we’re not able to connect to the Exchange 2013 tools from our Exchange Management Console until we upgrade things to Exchange 2010 SP3.
 
Still with me? Good.
 
A screenshot of the error we were receiving when trying to open the Office 365 EMC within Office 2010 On-Premise is below.
 
EMC Error
 
So, off I went upgrading our On-Premise Exchange environment to Exchange 2010 SP3. The first step in doing so is upgrading your CAS (Client Acccess Server) servers to SP3 before the mailbox. Easy enough…only the first one I started working did not have enough space remaining on the hard drive. Fortunately this system was a VM so I just needed to expand the VHD (and then expand the partition within Windows Server 2008). Easy enough. Except it wasn’t so easy.
 
Enter scene: Hyper-V 2008 Manager
 
It is a big No No to expand a VHD of a system that has an existing snapshot and I knew this. To ensure there were no existing snapshots I opened Hyper-V Manager and right-clicked on the desired VM in Hyper-V Manager and selected “Snapshot” just to see if any snapshots existed. However, it didn’t do anything after doing this so I looked in the section of Hyper-V Manager where the snapshots would be present and could see there were none.
 
With this knowledge in mind I went ahead and proceeded to expand the size of the CAS server VHD from 40gb to 60gb. When this finished, I noticed that a snapshot DID appear for the server now. Earlier when I clicked snapshot, even though I was presented with no options and wasn’t asked to confirm anything, it proceeded to create a snapshot (seriously Windows? what was the thought process here? I guess that is what I get since I’m used to VMware ESXi where you have to name your Snapshot and confirm that you want actually take a snapshot before it does anything).
 
Thinking that this wasn’t a huge deal I went ahead and deleted the snapshot since I didn’t intend to create one anyways. Problem resolved.
 
Only it wasn’t…the expanded disk was created and it was done with the assumption that there was a snapshot chain in place. Starting the VM after this presented the error below…uh oh!
 
VHD Chain Corrupted
 
After some initial panic I reminded myself to stay calm (important!) and devise a road to recovery. I would spend an hour trying to figure out a way to recover from this mistake and after that I would just deploy a new CAS server to replace it. This would be a time sink but one I could recover from.
 
Enter scene – VHDtool.exe (source: http://archive.msdn.microsoft.com/vhdtool)
 
This tool saved the day for me. I wasn’t 100% sure how to use it initially but after playing around with things I was able to get it to work and was able to get this problem fixed without having to resort to deploying a new CAS server!
 
The details below will outline what you need to do to get this tool to work if it is ever needed again the future (hopefully not).
 

  • Initially I was controlling Hyper-V from client machine and this did not work as the VHD was not located on this system. So the first step is to remote onto the actual Hyper-V server and then download VHDtool.

 

  • Once I downloaded the tool, I place the file in c:\Windows\System32 so that I could access it from the command line without having to input the directory

 

  • The next step was to figure out the name of the base VHD and the name of the snapshot it was expecting to be part of the chain, in addition to where they’re located (the easiest way to do this was to select the virtual system in Hyper-V manager and then select Edit Disk and the Browse to see where the files are).
     
    VHD File Name – basefilename.vhd
     
    AVHD File Name – you’ll find this file in the error screenshot above as well as from the file name in the folder where the VHD is located

 

  • Once I had figured this out I opened an administrative command prompt (right-click Command Prompt and select “Run as Administrator”). From there I navigated to the location of the VHD and AVHD files> cd c:\ClusterStorage\Hyper-V\CAS2

 

  • Next I ran the below command to repair the VHD> vhdtool.exe /repair ex2010-cas-fixed.vhd  ex2010-cas-fixed_37E0C9A9-11CE-46F6-8BAA-E0326EB3EF73.avhd

 
That’s it! Once I ran the command above everything started working again! It was a bit challenging to track this solution down but now that I know what to do it will be simple in the future (though hopefully it never happens again).

The Art of Problem Solving – Documentation

Product documentation should be your starting point when working with any technology. Documentation comes in many different forms and some of it is vastly better than others, but it is a good starting point any time you are working toward solving a problem. While the usefulness of documentation may vary, the biggest hurdle is determining how to find the documentation you’re looking for.

Windows Documentation

When searching for documentation for literature provided by Microsoft, there are a few places you should look:

  • Microsoft Technet is the primary resource for documentation on any software solution provided by Microsoft. To find the documentation you need, Google whatever product you’re working with and append “+technet” to the end of it. Additionally, you can perform search queries directly on their website.
  • PowerShell - Get-Help is your go to resource for documentation any time you’re working with PowerShell. To retrieve a list of all available help topics, simply type “Get-Help *” and it will provide you with a long list of commands that help is available for.
  • Command Prompt – append “/?” to the end of the any Windows command you’re trying to use and it will provide you with some basic instructions in addition to a few examples

*nix Documentation

Documentation for *nix based systems are available in much the same way as it is available for Windows based systems.

  • Distribution documentation – Each distribution has their own website dedicated to documenting their systems. Some of the more common ones are available below:
  • Man pages – To read the documentation to any *nix commands you simply enter “man <command name>”at the shell prompt and you’ll be presented with all of information you’re likely to need for that command.

 

A good grasp of using documentation is the foundation for achieving greatness in the world of systems administration. Master it early and use it often!

The Art of Problem Solving – Search Engines

The ability to effectively and efficiently make use of search engines should be at the core of your skillset as a systems administrator. While it is not the most important resource for any single subject, it is of the most use overall when compared to any of the other avenues for seeking out solutions.
 
The Good: Employing a well constructed search engine query is a great starting point when needing information about a solution that you’re unfamiliar with. The key here is to ensure you’re using it to finding documentation or other information that will help you better understand the technology that you’ll be employing to provide your desired solution.
 

Do: Use it for finding technical documentation, forums, and others who have encountered similar problems.

 

 
The Bad: While practicing excellent Google-fu is an important skill for a seasoned sys admin, you need to be sure you’re using it for the right reasons.
 

Do not: Use it to find solutions that walk you through step-by-step how to implement solutions that you have no understanding of. If you find a solution, great, however you should be careful to ensure you understand that solution before deploying anything into your environment.

 

 
The Ugly: While I’ve long known sys admins to “fix” problems without understanding the underlying cause has been a problem, it wasn’t until very recently that I realized somebody had coined a term for it. While reading a recent post over at the Standalone Sysadmin blog, the author introduced a blog post discussing Cargo Cult Systems Administration. This method of systems administration is the enemy of an expert sys admin and one you should take great caution to try and avoid.

 
Tips:

  • Understanding how to build advanced search queries will be extremely beneficial to you as you encounter less common problems
  • No single search engine is perfect, mix it up! If your default search engine isn’t turning up the results you’re looking for, try another.

 
Summary: In a previous life I used to be a somewhat talented musician and one of the things drilled into my head while studying music theory was that you should never learn to play an instrument by ear because you miss out on the core concepts necessary to truly understand music. This isn’t to say you can’t be a great musician if you can only play by ear but if you can’t read music and don’t understand the different musical philosophies you will have a lot working against you as you try to go from good to great.
 
The same can (and should) be said for systems administration: Be careful not to learn to perform systems administration by ear.

The Art of Problem Solving – Finding Your Greatness


It’s been too long since my last post but my inspiration to write seems to come in waves. I apologize for that but it is likely to be a constant struggle as I find it to be with many other bloggers as well. So with that said, I’d like to start this post with a quote that has stood the test of time:

 

“Be not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon them”
 
~William Shakespeare

 
I’ve been thinking a lot lately about what differentiates the good engineers from the great engineers. Being good has never been enough to satisfy my ambitions in the past and as such I’m constantly trying to find ways to improve myself both as a person and a professional. Part of this, for me, means that I need to be both consistent and logical in my problem solving abilities.

 
The problem is that I simply was not born great.

 
However, I’ve worked very hard to achieve greatness (although I’ve not yet achieved it). And at times I’ve thrust myself into situations that require greatness where I’ve been able to display something better than good, even if only for a moment.

 

It seems to me, the difference between somebody in this field that is good and somebody that is great, is not their breadth of expertise in a given subject matter, rather it is their ability to find solutions to any problems that may come their way.  As such, to be a great problem solver you need not only be analytical, you also need to know where to look for solutions.

 
This post is part 1 of a a series of posts that will discuss the different methods that I use to find solutions to the problems I face each and every day. It is not intended to be all-inclusive as I’m sure I will leave out a lot of great information. However, I’m hoping this will be a good starting point for any early-mid career IT professional in their search for greatness.

 
Below I’ve listed a number of different methods, which I will cover in greater depth in each of the subsequent posts. If at any point in time you have something to contribute, please feel free to offer your own insight as I’m well aware that there is more to problem solving than what I will be covering here.

 

  1. The Almighty Google (or Bing, which has been really growing on me lately)
  2. Documentation
  3. My Peers
  4. Forums
  5. IRC/Mailing Lists
  6. Trial and Error (this method should be avoided in production environments if the fix can potentially cause downtime!)

 

On your own journey to greatness, remember this: “Be humble. Be hungry.” Never let pride get in the way of greatness or you may never achieve it.