Creating a disaster recovery plan

Creating a disaster recovery plan

Posted 04th December, 2018 by Freddie

If your website is fundamental to your business continuity, you should have a disaster recovery plan. A disaster recovery plan sets out the steps to take if something occurs to affect your websites availability. Good preparation also involves taking active measures in advance of any issues (even if the likelihood of anything bad happening is minuscule). Business continuity insurance often requires a disaster recovery plan for good reason - with the right preparation the disruption caused by even catastrophic data loss can be minimal.

We can divide the three most common potential ‘disasters’ into three categories:

  • Cyber attacks, which cause a site to be defaced, riddled with malware, or broken.
  • Human error
  • Service provider issues, such as hardware failure

Cyber Attacks

A cyberattack is deliberate exploitation of computer systems, technology-dependent enterprises and networks. Cyberattacks use malicious code to alter computer code, logic or data, resulting in disruptive consequences that can compromise data and lead to cybercrimes, such as information and identity theft.Techopedia

Cyber attacks are malicious attacks against your website. These may either be directly specifically against your site, or, far more likely, be an exploit against the software you are running. Due to automation, a hacker can simultaneously attack millions of websites with a single script searching for a known vulnerability. Because of the amount of open source content management systems (for example Joomla or Wordpress or Magento) and their array of plugins, vulnerabilities are common if updates are not applied. At fixed.net we include a Wordpress vulnerability scanner in our free offering.

Malicious attacks can also happen through a password breach, desktop virus software, weak server security, or permissions errors on shared hosting (where one user account can read and write to other websites).

The aim of a cyber attack is usually financial; either the user can retrieve details they can sell, or they can upload a phishing page to a website that might harvest bank details. We [explore a few motivations in our knowledge base].

Cyber attacks happen and are likely to happen to you, so you should do what you can to protect against them. However, since 2017 at least 16 household names we know of have been hacked and had information stolen (according to Business Insider Magazine), including Adidas, Best Buy, British Airways and Kmart. These multi-national companies have dedicated security teams and budgets. If they are vulnerable, so is everybody else.

Human Error.

There is an anecdote about fire protection systems which goes like this: Office buildings and datacentres can spend hundreds of thousands on advanced prevention systems, which either sprikle water or emit gas to suppress oxygen. Water, if let off, might damage equipment, but tanks can be refilled. Gas is much better for equipment, but requires costly replacement after a single use. Either way, the expense is justified against the alternative of a fire. The vast majority of uses, however, are caused by human error. A press of a button, an engineer connecting the wrong wire, pure stupidity…

The same is true for websites. We can spend hours making a site resistant to malware, load balancing it over multiple servers, or setting up monitoring in case of an outage - but it doesn’t protect it from someone pressing delete. Every server admin or web designer has an embarassing story of when they typed `rm -rf` (recursive, confirmed delete) in the wrong directory, wiping out data. Plus we’ve all done something we can’t reverse and wished we could roll back a few moments, or just taken that one backup.

Whether the trigger is accidental deletion, accidental modification or pure incompetence, our disaster recovery plan should also account for rectifying this.

Service issues

Service issues are disruptions caused by your service provider: for example your web host, network provider or domain registrar. These range from server faults or DDOS attacks which may slow down the site or cause timeouts, all the way through to complete catastrophic data loss.

What is a disaster recovery plan?

Now that the three key target vulnerabilities have been recognised, we can plan the solution we need to offer in our disaster recovery plan. These are the steps we are going to take if any of the above mentioned become true. We will go through the steps below and then provide an example template plan you can adapt for your own needs.

The three main features of every recovery plan are:

  • Preventative measures that can be taken to minimise the risk of a disaster.
  • Detection of issues so that you are aware of issues as and when they occur.
  • Steps to take to correct and recover from the potential disasters.

Preventative Measures

Preventing measures is a series of good practices, which may be different for every site. Here are a few key examples:

  • Ensure website software and plugins are up to date. Set a regular interval at which you check.
  • Ensure that access to your website and hosting is locked down and only accessible when needed. All software should be multi-user, and each user should use only their own account.
  • Ensure that local machines and devices have virus software up to date.
  • Use a reliable hosting company. They should be able to provide you with their security policy.
  • Backup your data to a third party (such as fixed.net), to keep it separate from your web host.
  • Have easy access to manage your website DNS, for example with your domain provider or a third party such as Cloudflare.

You should document exactly where all of these settings are and who has access.

At fixed.net we can do regular security audits for all subscribers.

Detecting Issues

Every site should have active monitoring. This notifies you when a website is down or not responding as expected. Monitoring services check a website at regular intervals and track the response. The best monitoring also tracks response time: how long does it take to download the website.

In addition, a regular malware scan can alert you to any changes to files or suspicious code. You can either do this yourself using software like maldet or a simple grep, ask a technician to do it for, or use a third party solution to scan your website.

It’s not good if you are alerted to an outage by a customer.

Fixed.net can offer both website monitoring and malware scanning on our free subscription.

Disaster Recovery

The most important insurance you can have for a website is backups: of all parts - website files and databases. On larger developments this may be as simple as using version control with staging and production environments, but for the vast majority of sites - including for most Wordpress, Joomla or Magento websites - standalone backups should be taken.

In the worse case emergency scenario, then, the steps to take would be:

  1. Get a new hosting package or server online from any provider. You should identity some potential options in advance.
  2. Upload files and database from backups and configure them.
  3. Repoint your website DNS to the new hosting location.

This should ensure that the total maximum outage you should have is no more than a few hours.

Backups also protect against malware attacks and human error, as you can roll back to previous interations of a website. Even if you delete a photo by accident, or a file gets corrupted by a plugin install, a backup can be used.

Once recovered, the next step should be to identify the cause of the underlying issue.

An Example Disaster Recovery Plan

The website [websitename] is hosted on [hosting provider]. The domain is registered with [domain registrar] and DNS is managed by [dns provider] with the following nameservers:

Ns1: [Nameserver1]
Ns2: [Nameserver2]

The website A record points to the website [ ], with the database running [locally/externally at [ ]]

Login credentials to the hosting provider are stored securely by [people, contactable at]

Website files are backed up to fixed.net daily and revisions are stored [for 7 days/for 30 days/forever]. Backups can also be taken on demand.

The Website database is backed up to fixed.net daily and revisions are stored [for 7 days/for 30 days/forever]. Backups can also be taken on demand.

The website runs the following software [software] [which is maintained and monitored by fixed.net under a website subscription].

Prevention and Detection

[Company Name] has an active subscription with fixed.net who provide proactive monitoring, updates to plugins, daily backups and security audits.

On the client side, website users and access is strictly monitored. Every machine with access has an up to date virus scan subscription.

Resolutions

The following steps can be taken in the event of business disruption:

Disruption

Detection

Resolution

Website Defacement

  • Proactive website monitoring
  • Human notice
  • Malware warning
  • Customer notification
  • Browser warning
  • Website down as a result of broken site

  1. Roll back to a clean backup.
  2. Take the website temporarily offline for public users to prevent further defacement
  3. Find and close any software vulnerabilities
  4. Bring the website back online

Estimated 1-4 hours disruption

Human Error

  • Human notice
  • Proactive website monitoring

Roll back to a backup

Estimated under 30 minutes disruption

Minor Service Provider Issue

  • Proactive website monitoring
  • Service provider notification
  1. Await resolution from service provider, or see below

Major Service Provider Issue

  • Proactive website monitoring
  • Service provider notification
  1. Set up new hosting package with [ ]
  2. Upload files and databases from backup
  3. Repoint DNS with [ ] to new IP address

Estimated 2-6 hours disruption. DNS caching may cause some visitors to still see the old website.








Categories: Hosting, Backups, Monitoring