Anuj Varma has designed cloud-first as well as on-premises failover solutions for High Availability and Disaster Recovery.
Cloud Solution versus Traditional Solution – Three Common Scenarios
- Have you ever worried about a production server (or servers) in your farm crashing?
- Have you ever had potentially breaking code changes checked in by developers – that haven’t been completely tested prior to production push?
- Have you ever dealt with increasing response times from your web application – in the face of increasing user sessions? Often, causing your application to crash?
Scenario – A server in your web-(or data) farm crashes….
Traditional Failover Solution
Failover to existing nodes that haven’t crashed. However, if multiple nodes crash – or if the load balancer distributing traffic to the farm crashes, your site will most likely experience serious downtime.
Cloud Failover Solution
A monitoring service detects the crashed node and automatically notifies / triggers a ‘server template’ that spins up a new one. No manual intervention required. The new instance can be configured exactly liked the failed instance – or can be configured to a prior state.
Scenario – Load (Concurrent users) spike up suddenly – causing existing servers to choke, maybe even crash.
Traditional Failover Solution
You would need to manually provision more hardware – for either vertical or horizontal scaling – or both! Time consuming, expensive and most importantly, entails downtime!
Cloud Failover Solution
Infrastructure’s auto scaling capabilities spins up new instances and dynamically adds them to the pool. No manual intervention required.
Server Templates (The Magic Of)
Thanks to ‘server templates’ (e.g. VMWare templates, AWS AMIs, AWS CloudFormation Templates), spinning up entire VMs with a few lines of code has become a straightforward exercise in the cloud world. More importantly, these VMs can be defined with specific ‘roles’ – a WebServer Role, a DB Role etc. The exact role ‘configuration’ can be stored on a configuration server (CHEF Server, PuppetMaster, ANSIBLE Tower…) – making it immune from any accidental overwrites/destruction.
The bottom line is that you not only get a blueprint for automatic infrastructure creation – you also get a safe for locking this blueprint so that no one can destroy it. Template Repositories, Template Versioning, hardening of repositories – these are all evolving at a rapid pace, making the cloud-center solutions as as secure as traditional data center solutions.
What does all this have to do with High Availability or Disaster Recovery?
As you probably guessed from the recap above, in the cloud world, keeping COLD STANDBYs just doesn’t make much sense. When hardware fails, it is relatively painless to re-generate an identical copy of the crashed server.
The devil is in the details, of course, and one has to be mindful of how to recover any data, log files etc. on the crashed server. For e.g. – all the performance metrics (CPU usage, average memory usage etc. are all lost with the server crash). There are, fortunately, cloud patterns that help with centralized logging, data updates, performance metrics and other commonly needed server stats.
Here’s the rub…Cold Standby Servers can be replaced with on-demand, re-buildable instances. The instances do not need to be on standby – all that is needed are (well-tested) server templates that are easily accessible in case of a disaster situation. These server templates can recreate the crashed instances – in a way that retrieves all of the configuration data that was part of the crashed instance.
A Server going down (for whatever reason), is no longer a cause for serious concern.
Not only can a cloud service detect the crashed server, it can notify the appropriate server creation template to ‘spin up’ an equivalent server. What about all the configuration data etc. on the crashed server? That too, through innovative cloud template patterns, can be recovered from a centralized repository.
Planning (and Testing) Disaster Recovery for your critical, high-performance web apps, no longer needs to be the expensive and risky proposition that it used to be in the past.
https://www.linkedin.com/pulse/traditional-disaster-recovery-versus-dr-cloud-three-anuj-varma/