Welcome to AppFail
You last visited: never

Welcome to AppFail

Posted on 2009-12-05

Infrastructure Post

In recent days, the WyldRyde IRC network lost their services database, when one of their accounts was suspended and deleted by the shell provider. Apparently their backups of the services database were stored on that same shell account.

A disaster recovery plan is an important part of any infrastructure design. Business continuity is to important to be left to chance or to be executed by inadequately trained personnel. There are many different elements that have to be considered and implemented in a proper disaster recovery plan.

The first target is the Recovery Point Objective (RPO), the delta-t (time difference) between the most recent backup and the data loss event; This factor represents how much data it is acceptable or inevitable to lose. As an example, if you take backups once per day, what happens if the data loss event takes place at 5pm; now you've lost all of the work for the entire working day. If such a loss is not acceptable then backups must be taken more frequently. The disadvantage or impedance to this is that there is not always a backup window that frequently. It is not feasible to take a backup every hour if the backup takes more than an hour to complete, or if the backup impacts performance (of the system, or the network), and can only be done when the system is not in use. It may be possible to take advantage of a backup window during the lunch hour while everyone is away from their desk, but more frequent than that may not even be possible.

The next target in the disaster recovery plan is the Recovery Time Objective (RTO), the amount of time that elapses between the data loss event, and the full restoration of services or data. A backup system that can restore all of your data, but takes many hours to do it, may actually be less useful than a system that misses the latest few hours of data, but can complete the restoration of service in only minutes. This is a specific instance where it is important to consider the advantages and disadvantages of each backup modality. Image based system offer faster bare metal restores, where as incremental systems allow you to do partial restores and deduplication. In the event of a hardware failure, fire, theft, natural disaster, loss of utility power or internet connectivity, the ability to restore to a different system can be especially effective; This is a space where virtualization comes in to play in a big way, being able to restore your data quickly to another system, and/or another physical location can make all the difference.

The final target in a disaster recovery plan is Data Security. Although it is extremely important to ensure business continuity with a disaster recovery plan, the only thing worse than losing the data would be the data falling into the wrong hands. Backups must be properly secured, so there are a number of vectors that need to be considered. Transport Security, how the data is secured as it crosses the network, especially if your backup crosses public networks such as the internet; Standards based encryption is always best, TLS (SSL) allows you to not only encrypt the data while it is in transit, but can also ensure the identity of both endpoints. Storage Security, how the data is protected while it is stored until it is needed; Again standards based is the way to go, so AES symmetric key encryption ensures that the data is encrypted as it leaves the origin system, and the storage site doesn't need to have the key, so it allows further restriction of access and confidentiality. This can be especially important if you are using a 3rd party solution such as cloud storage. Key Security, is the final consideration for backup security; If the key is disclosed, then the backup is no longer secure, and if the key is lost, then the backup cannot be restored. Key management can become quite a task if you have separate keys for each machine, and a large number of machines. There have been a number of incidents where backups were lost or stolen and resulted in expensive data breaches and disclosures, such breaches are documented by the Privacy Rights Clearing House - Chronicle of Data Breaches.

There are still other considerations when it comes to developing a complete disaster recovery plan. One of these primary considerations is open files; Files that are being written to, especially databases and other large files with internal references, need to be backed up in a consistent state. If a file is large enough that it cannot be backed up between updates, it means that a backup that was in progress while the file was being written to, would contain the first part of the file in the old state, and the last part of the file in the new state, and once restored, the file would be corrupt and unusable. Depending on the operating system and the underlying file system, you can lock the files so they cannot be written to while they are being read by the backup, or systems such as the Microsoft Volume Shadow Copy, allow you to backup a consistent snapshot of the file. This type of solution by it self can be insufficient when the database spans multiple files, they must be backed up as a set in order to remain consistent. The problem with a locking mechanism, especially on a larger database, is that it becomes impossible to continue normal operations while the database is being backed up, and with the 24/7 nature of e-commerce, it is not feasible to shutdown the site for some hours each day in order to back it up. This is where replication comes in to play. If you have a read only slave of the database, you can lock the slave, and take a consistent backup of the database, and then once the backup is complete, the slave is unlocked and catches up the transactions that have occurred in the mean time from the replication log. Another primary consideration is validation; Making sure that not only are the backups actually taking place, but that they are valid, complete and can be restored. It's all well and good to have daily backups of your system, but if the data loss event happens, and you attempt to restore only to find out there is a problem, the backups are corrupt, missing key files, missing critical meta data, or that you don't have the proper decryption keys, the backups can turn out to be of absolutely no value.

My company, Near Source IT, offers an online remote backup solution under it's Backup.gd brand. Our solution is cross platform, available for Windows, Mac, FreeBSD and Linux; our solution also provides TLS encryption to protect your data while it transits the internet, and client side AES encryption, so that even we do not have access to the contents of your backups (Note: restoration requires a copy of your encryption key, be sure to store this in a secure place, else your backups will not be usable). Data is warehoused in our secure facilities in Hamilton, Ontario, Canada, protected by three layers of physical security. The facility is staffed 24/7 and is protected by a monitored alarm system and infrared surveillance cameras. To validate your backups, you have access to our convenient web interface which allows you to browse the recreated directory structure of your systems, and download individual files for convenience and verification. Note that individual file downloads via the web interface are not possible if you opt to use AES storage encryption; Your files can only be restored if the proper key is available to decrypt them.


blog comments powered by Disqus

Cuiusvis hominis est errare; nullius nisi insipientis in errore perseverare - Any man can make a mistake; only a fool keeps making the same one.

Digg Proof Hosting
The key to surviving Digg and Slashdot is Infrastructure. You can't get it from a regular web host, it requires experience. The High Load Hosting Experts at ScaleEngine can make your site thrive, and avoid having your site featured on AppFail.

Cyber Security Alerts

Page Generated in 14ms