- What is a right-sized disaster recovery plan? It's a concise and effective plan to support IT service continuity.
- Any time a natural disaster or major
IT outage occurs, it increases executive awareness and internal pressure to
create a disaster recovery plan (DRP), a specific set of requirements that
allow an organization to remain effective in the event of an outage, tailored
to an organization’s specific needs.
- Traditional DRP templates are onerous and result in a lengthy, dense plan that might satisfy auditors but will not be effective in a crisis.
- The myth that a DRP is only for major disasters leaves organizations vulnerable to more common incidents.
- The growing use of outsourced infrastructure services has increased reliance on vendors to meet recovery timeline objectives.
Our Advice
Critical Insight
- At its core, disaster recovery (DR) is about ensuring service continuity. Create a plan that can be leveraged for both isolated and catastrophic events.
- Remember Murphy’s Law. Failure happens. Focus on improving overall resiliency and recovery, rather than basing DR on risk probability analysis.
- Cost-effective DR and service continuity starts with identifying what is truly mission critical so you can focus resources accordingly. Not all services require fast failover.
Impact and Result
- Define appropriate objectives for service downtime and data loss based on business impact.
- Document an incident response plan that captures all of the steps from event detection to data center recovery.
- Create a DR roadmap to close gaps between current DR capabilities and recovery objectives.
Member Testimonials
After each Info-Tech experience, we ask our members to quantify the real-time savings, monetary impact, and project improvements our research helped them achieve. See our top member experiences for this blueprint and what our clients have to say.
9.6/10
Overall Impact
$46,936
Average $ Saved
26
Average Days Saved
Client
Experience
Impact
$ Saved
Days Saved
Evommune, Inc.
Guided Implementation
9/10
$6,850
18
Texas Trust Credit Union
Guided Implementation
10/10
$32,195
20
I really appreciate Scott’s experience and expertise on this subject. He made it clear and easy to understand.
City of Palm Beach Gardens
Workshop
10/10
$34,250
20
Time consuming for a large portion of staff however will pay off with saving time for all employees in the event of a disaster.
National Arts Centre Canada
Guided Implementation
10/10
$50,000
20
Working with Frank was wonderful, he was knowledgeable, patient and really took his time to walk us through the fundamentals without judgement. It ... Read More
State of New Mexico Early Childhood & Care Department
Guided Implementation
9/10
$13,700
20
Organization of materials that can be tailored to organization size and maturity and can be approached iteratively. Great callouts differentiating... Read More
County of Stafford
Workshop
10/10
$137K
90
Our workshop facilitator, Venkat, was extremely knowledgeable! He added significant value to our discussions and provided us with great recommendat... Read More
Eastern Michigan University
Guided Implementation
10/10
N/A
N/A
Our consultant was wonderful. He was very knowledgeable.
Stream Realty Partners, L.P.
Guided Implementation
9/10
$34,250
10
Lower Hudson Regional Information Center
Workshop
10/10
N/A
20
Venkat proved to be an exceptional consultant for our team. He came well-prepared and rapidly grasped the subtleties of our organization, adapting ... Read More
Solano County, CA
Workshop
10/10
$68,500
20
The workshop was fantastic. It empowered us to establish a robust BIA process that our organization was lacking. I now have the tools to develop a ... Read More
Department of Agriculture Kentucky
Guided Implementation
8/10
N/A
2
The risk assessment techniques were useful in gauging the impact of various scenarios, and the official engagement provided persuasion for other te... Read More
Conseil Scolaire Catholique MonAvenir
Guided Implementation
9/10
$25,000
38
The experience was excellent overall. Darin was very knowledgable and really did a great job of guiding us through the process. We really enjoyed t... Read More
Engie North America INC
Workshop
10/10
$137K
20
Joe Riley was an excellent resource with vast experience and expertise to guide Engie thru the process of identifying Gaps and documenting existing... Read More
Alabama Department of Economic and Community Affairs
Guided Implementation
9/10
N/A
100
It was great working with Scott Houle. He is knowledgeable and patient and made sure that I had all the information necessary to tackle something a... Read More
Firstmac Limited
Guided Implementation
10/10
$1,820
2
Support in classification and risk ranking of our BCP Risks.
Monroe #1 BOCES
Workshop
7/10
N/A
N/A
Facilitator was outstanding and flexible to our needs. As our work was focused on developing a roadmap for a DRP replacement, estimating time and ... Read More
Carver County, MN
Guided Implementation
9/10
$13,700
7
Socan
Workshop
10/10
$100K
10
We found the domain knowledge of the facilitator to be very good which helped us to ask and answer the right questions based on our environment and... Read More
Coachella Valley Water District
Workshop
10/10
$68,500
20
There are no issues or concerns to report. Sumit has proven to be an exceptional consultant, demonstrating a deep familiarity with the specific typ... Read More
First Ontario Credit Union Ltd.
Guided Implementation
10/10
$50,000
20
Darin's wealth of knowledge in the space has been fantastic. It has allowed us to bring in the correct stake holders to understand the needs of the... Read More
Woodbine Entertainment Group
Workshop
8/10
$10,000
20
Very difficult to meet ITRG's back-to-back scheduling requirement for these sessions. Excellent discussions facilitated by Venkat across lines o... Read More
State of Kentucky - Kentucky Transportation Cabinet
Workshop
10/10
$68,500
50
Venkat was a great facilitator and had clearly planned the days well and did preliminary work each night so we would have a very valuable day of wo... Read More
Small Enterprise Finance Agency
Guided Implementation
9/10
$13,700
20
Templates and advise was very useful
MyPath, Inc.
Guided Implementation
10/10
$137K
120
Frank brings excellent experience to the table in relation to our DR and BC Workshops and discussions. We are fortunate to have him facilitating fo... Read More
Lake Simcoe Region Conservation Authority
Guided Implementation
8/10
$5,000
5
Very knowledgeable help and good insights...
Georgia Department of Banking and Finance
Guided Implementation
10/10
$13,700
20
Kansas Public Employees Retirement System
Guided Implementation
10/10
$34,250
20
Best part is being able to ask questions on any parts that I don't understand. No worst part.
County of Montgomery
Guided Implementation
10/10
N/A
20
Andrew is an awesome analyst and advisor. He was able to keep us on track and explain everything in detail as we worked through the program. It was... Read More
Sirtex Medical US Holdings, Inc.
Guided Implementation
10/10
$129K
20
The value provided by Infotech is threefold. Firstly, there's the research; clear, comprehensive and ready to share, making me look like a rock sta... Read More
Santa Fe Community College
Guided Implementation
9/10
$1,039
23
Disaster Recovery Planning
Close the gap between your DR capabilities and service continuity requirements.
This course makes up part of the Security & Risk Certificate.
- Course Modules: 4
- Estimated Completion Time: 2-2.5 hours
- Featured Analysts:
- Frank Trovato, Research Director, Infrastructure Practice
- Eric Wright, SVP of Research and Advisory
Workshop: What is a right-sized disaster recovery plan?
Workshops offer an easy way to accelerate your project. If you are unable to do the project yourself, and a Guided Implementation isn't enough, we offer low-cost delivery of our project workshops. We take you through every phase of your project and ensure that you have a roadmap in place to complete your project successfully.
Module 1: Define Parameters for Your DRP
The Purpose
Identify key applications and dependencies based on business needs.
Key Benefits Achieved
Understand the entire IT “footprint” that needs to be recovered for key applications.
Activities
Outputs
Assess current DR maturity.
- Current challenges identified through a DRP Maturity Scorecard.
Determine critical business operations.
Identify key applications and dependencies.
- Key applications and dependencies documented in the Business Impact Analysis (BIA) Tool.
Module 2: Determine the Desired Recovery Timeline
The Purpose
Quantify application criticality based on business impact.
Key Benefits Achieved
Appropriate recovery time and recovery point objectives defined (RTOs/RPOs).
Activities
Outputs
Define an objective scoring scale to indicate different levels of impact.
- Business impact analysis scoring criteria defined.
Estimate the impact of downtime.
- Application criticality validated.
Determine desired RTO/RPO targets for applications based on business impact.
- RTOs/RPOs defined for applications and dependencies.
Module 3: Determine the Current Recovery Timeline and DR Gaps
The Purpose
Determine your baseline DR capabilities (your current state).
Key Benefits Achieved
Gaps between current and desired DR capability are quantified.
Activities
Outputs
Conduct a tabletop exercise to determine current recovery procedures.
- Current achievable recovery timeline defined (i.e. the current state).
Identify gaps between current and desired capabilities.
- RTO/RPO gaps identified.
Estimate likelihood and impact of failure of individual dependencies.
- Critical single points of failure identified.
Module 4: Create a Project Roadmap to Close DR Gaps
The Purpose
Identify and prioritize projects to close DR gaps.
Key Benefits Achieved
DRP project roadmap defined that will reduce downtime and data loss to acceptable levels.
Activities
Outputs
Determine what projects are required to close the gap between current and desired DR capability.
- Potential DR projects identified.
Prioritize projects based on cost, effort, and impact on RTO/RPO reduction.
- DRP project roadmap defined.
Validate that the suggested projects will achieve the desired DR capability.
- Desired-state incident response plan defined, and project roadmap validated.
Module 5: Establish a Framework for Documenting Your DRP, and Summarize Next Steps
The Purpose
- Outline how to create concise, usable DRP documentation.
- Summarize workshop results.
Key Benefits Achieved
- A realistic and practical approach to documenting your DRP.
- Next steps documented.
Activities
Outputs
Outline a strategy for using flowcharts and checklists to create concise, usable documentation.
- Current-state and desired-state incident response plan flowcharts.
Review Info-Tech’s DRP templates for creating system recovery procedures and a DRP summary document.
- Templates to create more detailed documentation where necessary.
Summarize the workshop results, including current potential downtime and action items to close gaps.
- Executive communication deck that outlines current DR gaps, how to close those gaps, and recommended next steps.
What is disaster recovery in IT?
At its core, disaster recovery in IT is about ensuring continual service to your users after an unexpected natural disaster or major outage.
Why do organizations need a disaster recovery plan?
No matter the level of preparedness, no IT department is immune to technological failures, software outages, or the impacts of a natural disaster. A pre-established disaster recovery plan allows IT to focus on resolving critical issues immediately, rather than spending valuable time identifying and prioritizing these issues after an outage.
What are some common issues with disaster recovery plans?
Many disaster recovery policies are designed only for specific outage events, making them difficult to adapt to situations that weren’t planned for in advance. Additionally, traditional templates are often onerous and result in a lengthy plan better suited to appease auditors than for crisis resolution.
What is the right-sized disaster recovery plan for an organization?
A disaster recovery plan (DRP) must be tailored to an organization's requirements to be effective in an outage. When establishing the size and scope of your organization’s DRP, you may consider the acceptable service outage times for major services, plan for common incidents as well as major disasters, and evaluate the disaster recovery preparedness of your vendors.
Create a Right-Sized Disaster Recovery Plan
Close the gap between your DR capabilities and service continuity requirements.
ANALYST PERSPECTIVE
An effective disaster recovery plan (DRP) is not just an insurance policy.
"An effective DRP addresses common outages such as hardware and software failures, as well as regional events, to provide day-to-day service continuity. It’s not just insurance you might never cash in. Customers are also demanding evidence of an effective DRP, so organizations without a DRP risk business impact not only from extended outages but also from lost sales. If you are fortunate enough to have executive buy-in, whether it’s due to customer pressure or concern over potential downtime, you still have the challenge of limited time to dedicate to disaster recovery (DR) planning. Organizations need a practical but structured approach that enables IT leaders to create a DRP without it becoming their full-time job."
Frank Trovato,
Research Director, Infrastructure
Info-Tech Research Group
Is this research for you?
This Research Is Designed For:
- Senior IT management responsible for executing DR.
- Organizations seeking to formalize, optimize, or validate an existing DRP.
- Business continuity management (BCM) professionals leading DRP development.
This Research Will Help You:
- Create a DRP that is aligned with business requirements.
- Prioritize technology enhancements based on DR requirements and risk-impact analysis.
- Identify and address process and technology gaps that impact DR capabilities and day-to-day service continuity.
This Research Will Also Assist:
- Executives who want to understand the time and resource commitment required for DRP.
- Members of BCM and crisis management teams who need to understand the key elements of an IT DRP.
This Research Will Help Them:
- Scope the time and effort required to develop a DRP.
- Align business continuity, DR, and crisis management plans.
Executive summary
Situation
- Any time a natural disaster or major IT outage occurs, it increases executive awareness and internal pressure to create a DRP.
- Industry standards and government regulations are driving external pressure to develop business continuity and IT DR plans.
- Customers are asking suppliers and partners to provide evidence that they have a workable DRP before agreeing to do business.
Complication
- Traditional DRP templates are onerous and result in a lengthy, dense plan that might satisfy auditors, but will not be effective in a crisis.
- The myth that a DRP is only for major disasters leaves organizations vulnerable to more common incidents.
- The growing use of outsourced infrastructure services has increased reliance on vendors to meet recovery timeline objectives.
Resolution
- Create an effective DRP by following a structured process to discover current capabilities and define business requirements for continuity:
- Define appropriate objectives for service downtime and data loss based on business impact.
- Document an incident response plan that captures all of the steps from event detection to data center recovery.
- Create a DR roadmap to close gaps between current DR capabilities and recovery objectives.
Info-Tech Insight
- At its core, DR is about ensuring service continuity. Create a plan that can be leveraged for both isolated and catastrophic events.
- Remember Murphy’s Law. Failure happens. Focus on improving overall resiliency and recovery, rather than basing DR on risk probability analysis.
- Cost-effective DR and service continuity starts with identifying what is truly mission critical so you can focus resources accordingly. Not all services require fast failover.
An effective DRP is critical to reducing the cost of downtime
If you don’t have an effective DRP when failure occurs, expect to face extended downtime and exponentially rising costs due to confusion and lack of documented processes.
Potential Lost Revenue
The impact of downtime tends to increase exponentially as systems remain unavailable (graph at left). A current, tested DRP will significantly improve your ability to execute systems recovery, minimizing downtime and business impact. Without a DRP, IT is gambling on its ability to define and implement a recovery strategy during a time of crisis. At the very least, this means extended downtime – potentially weeks or months – and substantial business impact.
Adapted from: Philip Jan Rothstein, 2007
Cost of Downtime for the Fortune 1000
Cost of unplanned apps downtime per year: $1.25B to $2.5B.
Cost of critical apps failure per hour: $500,000 to $1M.
Cost of infrastructure failure per hour: $100,000.
35% reported to have recovered within 12 hours.
17% of infrastructure failures took more than 24 hours to recover.
13% of application failures took more than 24 hours to recover.
Source: Stephen Elliot, 2015
Info-Tech Insight
The cost of downtime is rising across the board, and not just for organizations that traditionally depend on IT (e.g. e-commerce). Downtime cost increase since 2010:
Hospitality: 129% increase
Transportation: 108% increase
Media organizations: 104% increase
An effective DRP also sets clear recovery objectives that align with system criticality to optimize spend
Take a practical approach that creates a more concise and actionable DRP
DR planning is not your full-time job, so it can’t be a resource- and time-intensive process.
The Traditional Approach | Info-Tech’s Approach |
---|---|
Start with extensive risk and probability analysis. Challenge: You can’t predict every event that can occur, and this delays work on your actual recovery procedures. |
Focus on how to recover regardless of the incident. We know failure will happen. Focus on improving your ability to failover to a DR environment so you are protected regardless of what causes primary site failure. |
Build a plan for major events such as natural disasters. Challenge: Major destructive events only account for 12% of incidents while software/hardware issues account for 45%. The vast majority of incidents are isolated local events. |
An effective DRP improves day-to-day service continuity, and is not just for major events. Leverage DR planning to address both common (e.g. power/network outage or hardware failure) as well as major events. It must be documentation you can use, not shelfware. |
Create a DRP manual that provides step-by-step instructions that anyone could follow. Challenge: The result is lengthy, dense manuals that are difficult to maintain and hard to use in a crisis. The usability of DR documents has a direct impact on DR success. |
Create concise documentation written for technical experts. Use flowcharts, checklists, and diagrams. They are more usable in a crisis and easier to maintain. You aren’t going to ask a business user to recover your SQL Server databases, so you can afford to be concise. |
DR must be integrated with day-to-day incident management to ensure service continuity
When a tornado takes out your data center, it’s an obvious DR scenario and the escalation towards declaring a disaster is straightforward.
The challenge is to be just as decisive in less-obvious (and more common) DR scenarios such as a critical system hardware/software failure, and knowing when to move from incident management to DR. Don’t get stuck troubleshooting for days when you could have failed over in hours.
Bridge the gap with clearly-defined escalation rules and criteria for when to treat an incident as a disaster.
Source: Info-Tech Research Group; N=92
Myth busted: The DRP is separate from day-to-day ops and incident management.
The most common threats to service continuity are hardware and software failures, network outages, and power outages
Source: Info-Tech Research Group; N=87
Info-Tech Insight
Does this mean I don’t need to worry about natural disasters? No. It means DR planning needs to focus on overall service continuity, not just major disasters. If you ignore the more common but less dramatic causes of service interruptions, you are diminishing the business value of a DRP.
Myth busted: DRPs are just for destructive events – fires, floods, and natural disasters.
DR isn’t about identifying risks; it’s about ensuring service continuity
The traditional approach to DR starts with an in-depth exercise to identify risks to IT service continuity and the probability that those risks will occur.
Here’s why starting with a risk register is ineffective:
- Odds are, you won’t think of every incident that might occur. If you think of twenty risks, it’ll be the twenty-first that gets you. If you try to guard against that twenty-first risk, you can quickly get into cartoonish scenarios and much more costly solutions.
- The ability to failover to another site mitigates the risk of most (if not all) incidents (fire, flood, hardware failure, tornado, etc.). A risk and probability analysis doesn’t change the need for a plan that includes a failover procedure.
Where risk is incorporated in this methodology:
- Use known risks to further refine your strategy (e.g. if you are prone to hurricanes, plan for greater geographic separation between sites; ensure you have backups, in addition to replication, to mitigate the risk of ransomware).
- Identify risks to your ability to execute DR (e.g. lack of cross-training, backups that are not tested) and take steps to mitigate those risks.
Myth busted: A risk register is the critical first step to creating an effective DR plan.
You can’t outsource accountability and you can’t assume your vendor’s DR capabilities meet your needs
Outsourcing infrastructure services – to a cloud provider, co-location provider, or managed service provider (MSP) – can improve your DR and service continuity capabilities. For example, a large public cloud provider will generally have:
- Redundant telecoms service providers, network infrastructure, power feeds, and standby power.
- Round-the-clock infrastructure and security monitoring.
- Multiple data centers in a given region, and options to replicate data and services across regions.
Still, failure is inevitable – it’s been demonstrated multiple times1 through high-profile outages. When you surrender direct control of the systems themselves, it’s your responsibility to ensure the vendor can meet your DR requirements, including:
- A DR site and acceptable recovery times for systems at that site.
- An acceptable replication/backup schedule.
Sources: Kyle York, 2016; Shaun Nichols, 2017; Stephen Burke, 2017
Myth busted: I outsource infrastructure services so I don’t have to worry about DR. That’s my vendor’s responsibility.
Choose flowcharts over process guides, checklists over procedures, and diagrams over descriptions
IT DR is not an airplane disaster movie. You aren’t going to ask a business user to execute a system recovery, just like you wouldn’t really want a passenger with no flying experience to land a plane.
In reality, you write a DR plan for knowledgeable technical staff, which allows you to summarize key details your staff already know. Concise, visual documentation is:
- Quicker to create.
- Easier to use.
- Simpler to maintain.
"Without question, 300-page DRPs are not effective. I mean, auditors love them because of the detail, but give me a 10-page DRP with contact lists, process flows, diagrams, and recovery checklists that are easy to follow."
– Bernard Jones, MBCI, CBCP, CORP, Manager Disaster Recovery/BCP, ActiveHealth Management
Source: Info-Tech Research Group; N=95
*DR Success is based on stated ability to meet recovery time objectives (RTOs) and recovery point objectives (RPOs), and reported confidence in ability to consistently meet targets.
Myth busted: A DRP must include every detail so anyone can execute recovery.
A DRP is part of an overall business continuity plan
A DRP is the set of procedures and supporting documentation that enables an organization to restore its core IT services (i.e. applications and infrastructure) as part of an overall business continuity plan (BCP), as described below. Use the templates, tools, and activities in this blueprint to create your DRP.
Overall BCP |
---|
IT DRP | BCP for Each Business Unit | Crisis Management Plan |
---|---|---|
A plan to restore IT services (e.g. applications and infrastructure) following a disruption. This includes:
|
A set of plans to resume business processes for each business unit. Info-Tech’s Develop a Business Continuity Plan blueprint provides a methodology for creating business unit BCPs as part of an overall BCP for the organization. | A set of processes to manage a wide range of crises, from health and safety incidents to business disruptions to reputational damage. This includes emergency response plans, crisis communication plans, and the steps to invoke BC/DR plans when applicable. Info-Tech’s Implement Crisis Management Best Practices blueprint provides a structured approach to develop a crisis management process. |
Note: For DRP, we focus on business-facing IT services (as opposed to the underlying infrastructure), and then identify required infrastructure as dependencies (e.g. servers, databases, network).
Take a practical but structured approach to creating a concise and effective DRP
Info-Tech offers various levels of support to best suit your needs
DIY Toolkit
"Our team has already made this critical project a priority, and we have the time and capability, but some guidance along the way would be helpful."
Guided Implementation
“Our team knows that we need to fix a process, but we need assistance to determine where to focus. Some check-ins along the way would help keep us on track.”
Workshop
“We need to hit the ground running and get this project kicked off immediately. Our team has the ability to take this over once we get a framework and strategy in place.”
Consulting
“Our team does not have the time or the knowledge to take this project on. We need assistance through the entirety of this project.”
Diagnostics and consistent frameworks used throughout all four options
Info-Tech advisory services deliver measurable value
Info-Tech members save an average of $22,983 and 22 days by working with an Info-Tech analyst on DRP (based on client response data from Info-Tech Research Group’s Measured Value Survey, following analyst advisory on this blueprint).
Why do members report value from analyst engagement?
- Expert advice on your specific situation to overcome obstacles and speed bumps.
- Structured project and guidance to stay on track.
- Project deliverables review to ensure the process is applied properly.
Guided implementation overview
Your trusted advisor is just a call away.
Define DRP scope (Call 1)
Scope requirements, objectives, and your specific challenges. Identify applications/ systems to focus on first.
Define current status and system dependencies (Calls 2-3)
Assess current DRP maturity. Identify system dependencies.
Conduct a BIA (Calls 4-6)
Create an impact scoring scale and conduct a BIA. Identify RTO and RPO for each system.
Recovery workflow (Calls 7-8)
Create a recovery workflow based on tabletop planning. Identify gaps in recovery capabilities.
Projects and action items (Calls 9-10)
Identify and prioritize improvements. Summarize results and plan next steps.
Your guided implementations will pair you with an advisor from our analyst team for the duration of your DRP project.
Workshop overview
Contact your account representative or email Workshops@InfoTech.com for more information.
End-user complaints distract from serious IT-based risks to business continuity
Case Study
Industry: Manufacturing
Source: Info-Tech Research Group Client Engagement
A global manufacturer with annual sales over $1B worked with Info-Tech to improve DR capabilities.
DRP BIA
Conversations with the IT team and business units identified the following impact of downtime over 24 hours:
- Email: Direct Cost: $100k; Goodwill Impact Score: 8.5/16
- ERP: Direct Cost: $1.35mm; Goodwill Impact Score: 12.5/16
Tabletop Testing and Recovery Capabilities
Reviewing the organization’s current systems recovery workflow identified the following capabilities:
- Email: RTO: minutes, RPO: minutes
- ERP: RTO: 14 hours, RPO: 24 hours
Findings
Because of end-user complaints, IT had invested heavily in email resiliency though email downtime had a relatively minimal impact on the business. After working through the methodology, it was clear that the business needed to provide additional support for critical systems.
Insights at each step:
Identify DR Maturity and System Dependencies
Conduct a BIA
Outline Incident Response and Recovery Workflow With Tabletop Exercises
Mitigate Gaps and Risks
Create a Right-Sized Disaster Recovery Plan
Phase 1
Define DRP Scope, Current Status, and Dependencies
Step 1.1: Set Scope, Kick-Off the DRP Project, and Create a Charter
This step will walk you through the following activities:
- Establish a team for DR planning.
- Retrieve and review existing, relevant documentation.
- Create a project charter.
This step involves the following participants:
- DRP Coordinator
- DRP Team (Key IT SMEs)
- IT Managers
Results and Insights
- Set scope for the first iteration of the DRP methodology.
- Don’t try to complete your DR and BCPs all at once.
- Don’t bite off too much at once.