Ensuring Data Protection and High Availability Pt. 2

Years ago I took a Project Management class that covered lessons that I still find handy to this day regardless if I’m wearing a PM hat or not.  One of those lessons really sticks out in detail though.  Our project was simple.  We were given a bucket full of lego parts, and told to build a car.   Everyone scrambled to start assembling something that resembled a car.  No two cars were alike and everybody was missing parts to make something truly functional.


The moral of the lesson was that we failed to define success.   Nobody stopped to ask what the car’s requirements were, or what options were available in building the car.

My experience has shown, ironically enough, that defining success is probably the hardest part of any technical project.   It sounds simple enough to implement an HA cluster with some backup software, and as engineers we have a tendency to take those high level request and start building a solution before we nail down the details.    It might work exactly as you planned, but it may not work the way the business needs it to work.

Not taking these steps usually leads to one of three scenarios which often leads to failure of some kind in the end.

  1. No metrics are defined by either the business or IT group.
  2. Metrics are defined by the business without any realities of technical capabilities or cost.
  3. Metrics are defined by the IT group without any realities of the business needs.


Instead, start with taking a look at already published SOP’s if the exist.  Talk with leadership outside of the immediate IT organization to understand their requirements and what future roadmaps they have.  In big organizations with in house application teams that support specific business functions  they can provide a wealth of information where non-technical business leaders might be lacking.  Make sure any in-house legal or regulatory group is also included in the conversation as well as they may have additional input or need to review others requirements to ensure they don’t conflict with compliance requirements driven outside the business.  Don’t offer solutions on this first pass as this is only for information gathering.

Next, take these details and break them down into standardized categories.  Start to build a matrix that you easily fit each component into and make sure its documented in a easily understood format.  If there are requirements that seem ludicrous, don’t scratch them off but instead put a note to further investigate more detail.   With all of these details in hand, have the leadership within IT sign off on these requirements and make any adjustments based on what they are willing and able to support.  The final step of course is to go back to the business owners and have them sign off on the standardized requirements.

You will of course need to make further revisions once you start designing a solution around those requirements, but it gives you concrete business objections to meet.  If the only solution to a requirement is too complex or too costly, you may get permission to scale back or better yet will have the documentation you need to justify an increase in the budget with the business.

If you do all the legwork and still nobody has an idea where to start, take a page out of ITIL and come up with several standard service offerings and use that as a starting point to negotiate from.

For starters, here’s common questions that should be asked.

  1. Is this system / application critical to a core function of the business?
  2. How much does data unavailability cost the business?
  3. Are there any contractual or legal requirements imposed on the availability of the system?
  4. Are there any contractual or legal requirements imposed on the retention of the data produced by the system?
  5. Who are the end users of a system, where they access it from, and when do they use it?
  6. How is data stored?
  7. Where is the system located or planned to be located?
  8. Are there any windows of time (be it nightly, weekly, monthly, quarterly, etc) that are acceptable to take the system offline to performance maintenance?


My last parting advice is to ensure that you stick to common methodologies.  They don’t need to be the same “industry standard” way of doing things that everybody else claims to use, but they should be consistent based on where something falls in that matrix.    The fewer the options and variations, the easier it will be to manage and automate.