Comprehensive software reviews to make better IT decisions
Disaster Recovery Is a Cloud Gateway Drug
Cloud-based disaster recovery (DR) is a sort of gateway drug for public cloud. In addition to providing an on-demand failover target in an emergency, a cloud DR project provides a foothold for exploring further use of cloud, including permanent migration. But it is important to view cloud DR as a potential part of migration, not the whole story.
Among Info-Tech client firms developing a cloud strategy, two near-term projects have come up repeatedly in the past two years: Office 365 migration and leveraging the cloud for offsite backup and disaster recovery. These two are the “right now” projects in cloud strategy; moving legacy core applications to the cloud is the “maybe tomorrow” project.
Discussion of disaster recovery on a public cloud like Azure or AWS is reminiscent of where server virtualization was ten years ago. Back then, virtualization was just beginning to take off, but there was hesitancy, if not outright resistance, to virtualization because of concern over whether a virtual machine (VM) could be trusted with a critical workload.
Today, as ten years ago, many organizations are somewhat hesitant about hosting production workloads on a public cloud. And just as ten years ago, DR in the cloud offers the benefit of building an offsite recovery capability, without building or renting a data center, as well as an opportunity to test the capabilities of the public cloud to host those important systems.
But there are important differences between what is going on with cloud migration now and migration to virtual infrastructure then.
When DR Was the Virtualization Gateway Drug
Early virtualization projects were focused on secondary servers such as those for test and development. Virtualization of core production servers was not nearly as aggressive. For many organizations, what got virtualization moving was not a stellar business case about capital expense reduction and provisioning agility but rather business continuity/disaster recovery planning and backup.
Backup had grown beyond files to full system imaging. A new term, bare metal restore (BMR), was coined for the full restore of a running server (operating system, applications, data) to different server hardware. Success with BMR prompted a question: if the imaged system can be restored to different hardware (on premises or in another data center), could it not be restored to a virtual machine hosted on a server running a hypervisor like VMware?
Restore to VM had a number of advantages for system availability and recovery:
- You didn’t need to purchase redundant hardware to ensure warm failover availability. You only needed enough available capacity on the secondary hardware.
- Instead of acquiring, configuring, testing, and deploying new hardware for restore, a new properly configured VM could be provisioned in minutes.
- In the DR use case, performance did not need to be best but good enough for the system to be available during the emergency.
For all of these reasons virtualization for system restore was an easier sell than virtualization for critical production servers. Those servers could be brought up quickly in an emergency on VMs, and they didn’t have to be perfect, just good enough for the duration of the event.
It was testing that really sold virtualization to the business. Regardless of your backup methodology and architecture, it is only as good as the restore. In testing restore, organizations found that the virtual infrastructure was very resilient, was suitably performative, and was instantiated very quickly.
In one case I recall from this period, an IT director at a large professional services firm made the virtualization case to senior management to a lukewarm response. To them it seemed risky even if the benefits looked promising. However, virtualization was deployed for backup restores in DR planning.
A year later the same IT director was presenting results of an annual DR test. A senior exec stopped him:
Exec: Wait a minute. There is something wrong here.
IT Director: What is that?
Exec: It says here you were able to fully restore system function in three hours.
IT Director: Yes.
Exec: But two years ago your report said it took three days.
IT Director: Yes, but we are restoring to virtual machines now. They are very fast to spin up.
Exec: And how did they work after that?
IT Director: Full function, and if there is a problem, restart takes five minutes.
Exec: Well, if it’s so functional and resilient, why aren’t we just running the systems on virtual machines full time?
Once the tech had proven itself thus, migration to virtual machines took off. Organizations could buy new virtual machine host servers. They could treat those servers at first as restore targets. Then when the old hardware reached end of life they could do a permanent failover, and the secondary would become primary. For new workloads a policy of virtualization first was followed.
From Virtualized DR to Cloud DR
Virtualization helped simplify offsite disaster recovery. Migrating systems to the secondary site was a matter of copying data, not moving hardware. A virtual machine is, essentially, a data file. For fast failover the data from running systems could be replicated continuously, the spinning up of VMs at the second site automated.
The success of site-to-site replication and virtual site recovery services prompted yet another question: If we can host the virtualized systems and data at a second site, why couldn’t we host them in the cloud? After all, cloud Infrastructure as a Service (IaaS) is based on virtual machines running on the cloud provider’s infrastructure.
“Ho yes!” responded the cloud providers quick to exploit a new market for Disaster Recovery as a Service (DRaaS).
And just as restore-to-VM services of old handled the configuring of backed-up bare metal servers to run on VMs, cloud-based DR services (such as Azure Site Recovery) also manage the transition of bare metal server images and VMs to run on their IaaS platform. For example, in the case of Azure, backed-up VMware virtual machine files are configured to run as instances on Microsoft’s non-VMware cloud.
Many of the usual suspects in backup – such as Veeam and Commvault – have partnered with the public cloud providers. The backup software handles onsite backup and offsite replication to the cloud. In an outage, the cloud-based site recovery services can take the data and instantiate your critical app servers.
Also similar to the old restore-to-VM services, cloud DR provides an opportunity to get more comfortable/knowledgeable regarding public cloud IaaS. Recovery services need to be tested. Testing can answer three important questions about migrating a given system to cloud IaaS:
- Will the system run in the cloud?
- Will performance, availability, security, and compliance meet requirements?
- How much will it cost?
Of course the cloud provider or a third-party consultant can answer those questions, but those answers will be estimates based on architecture and rate cards. Nothing is as good as a live test. If it all works, there is also the potential of effecting a permanent failover to migrate to the cloud.
How the Current Situation Is Not Like the Past
Before we run off and architect cloud backup and cloud recovery services as the gateway to permanent cloud migration, it is important to recognize how cloud DR is not like those old restore-to-VM projects.
In restore-to-VM the virtual server infrastructure was the end game. The goal in virtualization was to move all server workloads to virtual infrastructure. Today that goal has largely been achieved, with many organizations reporting 90% or more of their servers virtualized. In hyperconverged infrastructure, storage and network switches are also fully virtualized.
Cloud IaaS is a form of virtual infrastructure. But is your cloud strategy end game to mirror your on-premises infrastructure in a cloud? DR can help you prove the viability of this “lift and shift” cloud migration, but is that the best-case scenario? What about refactoring applications and data for Platform as a Service (including cloud native application development)? What about migrating data to Software-as-a-Service applications?
In Info-Tech’s cloud strategy research, a “cloud first” policy is not the same as the old “virtualize first” policy. Where the latter asked “Can this app or service run on a virtual machine?” the former asks “Can this app or service be hosted on Infrastructure as a Service, Platform as a Service, or Software as a Service? Which is the best fit?”
- Make sure cloud is a fit for your DR planning.
Organizations pursuing “virtualization first” did find that there were applications that were not candidates for virtualization. Similarly, a DR scenario that includes systems (such as non-x86 systems) that are not easily migrated to a public cloud will need to look elsewhere or pursue a more hybrid cloud/non-cloud approach. For more on that see Info-Tech’s Select the Optimal Disaster Recovery Deployment Model.
- Leverage cloud DR as a start for cloud migration, not the whole story.
Your cloud strategy needs to consider the broad canvas of cloud-based services. Cloud DR provides a gateway for broader infrastructure “lift and shift” to cloud IaaS but this may only be the first phase of a longer-term roadmap that ends in multi-service hybrid cloud.
- Use cloud recovery testing to get a real-world understanding of capabilities and costs.
If you are pursuing a cloud DR strategy, leverage your recovery testing to get the best evidence about the viability of permanently hosting your systems in the cloud. Remember that any DR plan needs to be tested. This could be an opportunity.
Disaster recovery on a public cloud service like Azure or AWS is a gateway drug to making further investments in cloud. That’s because cloud DR provides a relatively low-risk method of putting a toe in the water and gauging whether cloud is a viable option for further investment. Just make sure that you do not make it your final destination.
Want to Know More?
Microsoft Cloud Services Usage Surges 775% for Teams in Regions With Enforced Social Distancing – Part 2
Experiencing issues when using Microsoft online services? You are not alone. Capacity constraints were being hit, pre-COVID-19, and usage has surged in regions with enforced social distancing.
Google has announced a premium support plan for its cloud customers, promising a 15-minute response to the highest severity tickets. Google’s cloud has long struggled with enterprise customers – especially when compared to giants Microsoft and AWS – and this announcement is the latest incarnation of Google’s push to better serve a critical constituency.
Microsoft Announces Expansion of Azure Canadian Infrastructure, Offers Data Residency and High Availability
In January, Microsoft announced what it’s calling “the largest expansion of its Canadian-based cloud computing infrastructure” since 2016. Additional availability zones and services will increase capacity for cloud-hungry Canadians, and the addition of an Azure ExpressRoute site in Vancouver will guarantee security and performance in a regulated jurisdiction.
Microsoft’s announcement that server-side encryption with customer managed keys for Azure Managed Disks is now available is welcome news for security-minded public cloud customers. Managing one’s own keys in a cloud environment can be an important step in complying with regulatory requirements, and this new feature should open Azure Managed Disks to a wider group of customers who may have held back for this reason.
ServiceNow’s Orlando release introduced Now Intelligence, a set of features that strengthen ServiceNow’s lead in the AI-powered IT service management (ITSM) and digital transformation space.
Amazon Web Services (AWS) has provided its customers with better options for Virtual Private Cloud (VPC) ingress routing. Customers will have to consider which works best for their needs.
AWS VPC Traffic Mirroring gives customers more visibility for out-of-band traffic inspection. This feature is another useful tool for monitoring in the AWS cloud.
Microsoft Cloud Services Usage Surges Over 700% in Regions With Enforced Social Distancing: How Could This Impact Your Organization?
Organizations have been running into capacity constraints on cloud infrastructure in regions with enforced social distancing due to COVID-19. Having a back-up plan will be critical to your business continuity plans.
Microsoft has added six months of additional support to Windows 10 Enterprise and Education 1709. This will help reduce pressure to upgrade and provide support in the interim as companies focus on business continuity plans due to COVID-19.