A recent Handler's Log on the SANS Internet Storm Center
spoke of the recent demise of an early blog site called "Journalspace.com." Evidently their disaster recovery strategy consisted of maintaining a mirrored RAID system.
I've written quite a bit about how mirrored RAID is a fundamental part of my disaster recovery strategy
. However, the Journalspace people apparently skipped an essential step: they relied solely on their on-line data and didn't keep an off-line (preferably off-site) backup.
Someone from Journalspace posted a brief explanation
of what happened to their site - they relied exclusively on mirrored RAID to keep a recoverable backup of their site. The posting suggested ways by which the victims of the crash (former Journalspace bloggers) might be able to scavenge bits of their now-vanished postings by searching Google caches.
Yes, I myself rely heavily on mirrored RAID, but I don't use it alone. I also keep off-site backups, though those are likewise produced using the RAID mirroring.
When I look at security issues and make security-related decisions, I like to start by identifying what I'm trying to protect, and the sorts of risks I want to address. In this case, here are the things that worry me:
- Hard drive crash. In the global scheme of things, hard drives are incredibly reliable. Likewise, if we take the long view, they crash pretty often. You don't want to trust your last copy of that photo of your great grandparents to a single hard drive in your office.
- Loss of the whole computer. I could lose my computer through theft, fire, or other disasters.
- "Slip of the pen" sorts of errors, where I lose a file or a folder of files due to some dumb mistake, like deleting the wrong thing. This risk may also include random software errors that inexplicably damage or delete files unexpectedly.
I assume that Journalspace and most other folks want to address the same sorts of risks. Here's what I do:
- Use the Macintosh Time Machine to keep an incremental backup of my files. This is automatic and involves no overt actions on my part. The only thing I have to do is be sure to leave the machine running enough so that the automatic backups take place.
- This deals with 'slip of the pen' errors. It also means that I can use a single, fast hard drive as my system drive and use slower drives for the backup, since the backing up isn't something I wait around for.
- Store the Time Machine on a mirrored RAID set, also known as RAID 1. This takes two identical hard drives and keeps an identical copy of everything on both drives. In particular, I have two 750G drives that act like a single 750G drive.
- This deals with the 'disk crash' risk. If one of the drives dies, the system keeps running on the other, working drive. Meanwhile I go get myself a replacement drive, install it, and the Disk Utility will copy the working drive onto the new one. I can keep running while I await the replacement drive, though I run the risk of losing recent work if the second drive fails.
- Swap out one of the two mirrored drives every few weeks and keep a copy off-site.
- This deals with the 'loss of whole computer' risk. If the whole computer disappears, I still have my off-site backup.
While the Macintosh software RAID isn't a pain-free implementation, it works well enough. I've had enough experience to know that I can rebuild my system drive from a RAID copy of my Time Machine files.
An interesting bit from the Journalspace story: the actual damage is attributed to a disgruntled IT guy. This guy set up the RAID system and asserted that it was enough of a backup to keep the system going. On his way out the door after being fired, the guy apparently wrote garbage over the blog's SQL database. Since the writing took place from the server software, the writing was mirrored to both drives, wiping out everything.