Backing up a Mac to Amazon S3 with Arq: the easiest, safest and most accurate solution

Mac is among the most difficult systems to backup and restore correctly: Macs have peculiarities; resource forks and packages, for example, are unique to the Mac, and not every service handles them well.

Few people pay much attention to this until they experience data loss through hardware damage, material error, theft or some other act of God the consequences of which can prove impossible to repair.

The characteristic reaction to this, by many users, is simply to ignore the issue altogether and bank on the fact that data loss is unlikely to occur. This is the digital equivalent of driving uninsured or opening a undeclared bank account at a Latvian bank: it can seem a clever idea at the time, but time seldom stands still when risk-taking is involved.

So what backup service to choose? To help one with this, there is a great tool called Backup Bouncer that can be used to verify metadata backup and restore for Mac backup software.

Using Time Machine or Apple's Time Capsule exposes one to unacceptable risks

Of course you can always use Apple's Time Machine to create a copy of your data. Time Machine was somewhat buggy and slow in Mac OS X Leopard 10.5. For example, it would say “preparing backup” forever. In Mac OS X 10.6 Snow Leopard, Time Machine is admittedly much faster and more reliable too. Nevertheless, a Time Machine backup drive must remain attached to the computer, or relatively near it. Therefore, it’s exposed to the same risks as the computer: theft, fire, flood. Not something I would want to trust.

Apple offers its own in-house network-attached storage backup device, Time Capsule, to use in conjunction with Time Machine: one of its key features is the ability to back up a system and files wirelessly and automatically, which eliminates the need for a separate external drive to be attached. In October 2009, several news sites reported that many Time Capsules were failing after eighteen months. Users have alleged that this is due to a design failure in the power supplies; Apple has confirmed that certain Time Capsules sold between February 2008 and June 2008 do not power on or may unexpectedly turn off and has offered free repair or replacement to affected units. The risk of using Time Capsule remains, to my mind, unacceptable, as witnessed by the continued existence of the Apple Time Capsule Memorial Register, in witness to all the defunct devices that perished in active service. The same, in reality, is true of any external drive: I've owned several, and not a single one of them has not failed me eventually, occasioning data loss. In fact, horror stories about Time Capsule are so numerous that I haven't been even remotely tempted to try one out, ever. Here's one from TUAW taken as an example:

Working with Time Machine in Leopard or Snow Leopard, the Time Capsule updates its backups every hour. This makes perfect sense if you're just dealing with one Mac wired into the Time Capsule, since it really doesn't slow anything down. But if you are using it to wirelessly back up multiple Macs, hourly backups slow everything down to a crawl.

TimeMachineEditor (a free utility that I highly recommend), allows you to set Time Machine to back up as frequently or infrequently as you like. I created a setup where, with staggered backups starting between 2am and 4am, each Mac gets backed up once a day. Outside of some errant sparse image problems that required a reformat, all was well. I had long beaten the 18 month Time Capsule funerals that were recently reported... but then things turned ugly.

About two weeks ago, my Time Capsule died, taking my home network down with it. The next day I went to the Apple Store and bought a replacement. I was quite unhappy that many months of backups were gone, but machines do break. I brought the new unit home and configured it. In less than 10 minutes, I was up and running ... for a week. Then my unit died again. The Time Capsule showed a solid amber light that could not be corrected with a factory reset. After replacing the cable modem, and being able to successfully plug it into my iMac, the problem persisted. I went through another two Time Capsules that would not complete the start-up sequence.

Although strange, the problem seemed to be the power feeding the Time Capsule. All other devices on the same circuit worked fine, but it took a dedicated power line devoted to the Time Capsule for everything to work properly. This may be related to power issues that impaced many users, or maybe not. It seems that power fluctuations that any other devices would take in stride can wreak havoc with a Time Capsule.

While external drives may often be the only choice for those with large photo libraries (in which case I'd recommend Apple's Mac Mini with Snow Leopard Server, the only realistic solution, to my mind, was off-site backups. The two drawback of these, of course, are (1) that it can take an awfully long time sending one's initial backup to an offsite server, and (2) that it isn't altogether reassuring to know that one's data is stored, even in encrypted form, in the care of a third party.

Unfortunately, most commercial off-site backup services are slow, and corrupt the data they store on your behalf

Just to make things worse, it turns out a number of commercial backup services corrupt data they store on your behalf. The Mac is among the most difficult systems to backup and restore correctly: on modern filesystems, there is a lot more than just file data on your filesystem. There is lots of metadata information that goes along with your files and can be important to backup and restore in many situations. One example is security information—most desktop users don’t really know about file ownership, security information, and ACLs, but on servers that type of information is important and if lost it could lead to a security leak or permission issues. Fortunately, Backup Bouncer, which I mentioned earlier, can be used to verify metadata backup and restore for Mac backup software. The results, for most of the commercial services available, are actually pretty disastrous:

Product Results  
Mozy Failed 16 of 20 tests see results
Backblaze Failed 19 of 20 tests see results
Carbonite Failed 20 of 20 tests see results
Dropbox Failed 19 of 20 tests see results
CrashPlan Central Failed 12 of 20 tests see results
Jungle Disk Passes all tests see results

(Source: Arq website)

The Backup Bouncer data doesn't, either, take account of the time it takes to carry out one's initial backup. When I tried to back up a small MacBook with 150GB of data on Backblaze, it took three weeks of continuous use—and considerable CPU slowdown—for my initial upload of files to complete. Crashplan actually offers a service whereby you can seed your initial backup, sending you a drive by post on which to transfer your data before sending it back to them, but they limit this to 1TB, and charge $124.99 for the service. I still think they are the best of the standard commercial backup services, if only because of this useful feature—but even they are hardly perfect.

Amazon S3 is the cheapest, quickest, most reliable and secure backup medium available

Nonetheless, the best—and by far the cheapest—way of ensuring quick, secure, accurate and truly private backups remains a timeless favourite, Amazon S3. As Amazon themselves put it:

The Amazon S3 solution offers a highly durable, scalable, and secure solution for backing up and archiving your critical data. You can use Amazon S3’s Versioning capability to provide even further protection for your stored data. If you have data sets of significant size, you can use AWS Import/Export to move large amounts of data into and out of AWS with physical storage devices. This is ideal for moving large quantities of data for periodic backups, or quickly retrieving data for disaster recovery scenarios.

It's no coincidence that Jungle Disk is the only of the services surveyed above that passes all of Backup Bouncer's tests: Amazon's storage service is the only one that provides absolutely accurate copies of Mac OS X files. It also has the enormous advantage over commercial services that store data on their own or on third-party servers that it protects you from the risk of your backup company going under or suffering from some major unforeseen disaster [i].

I have actually been conducting backups of my various Macs using Amazon S3 ever since Paul Stamatiou wrote about it in 2007. But in recent months I've become increasingly dissatisfied with Jungle Disk, the client I've been relying on to back my data to my server: it tends to occupy a very prominent place in my dock, with its extremely ugly icon, uses its own proprietary file format and causes occasional CPU bottlenecks. After giving up on Backblaze because of its atrocious slowness and poor customer service, I finally removed Jungle Disk, Backblaze and Crashplan and instead opted for the little-known, very lightweight Arq from Haystack Software. Its advantages can be briefly summarised as follows:

  • Arq stores your backup data in your own Amazon S3 account, encrypted with your own password—neither Amazon nor Haystack Software have access;
  • Arq encrypts your data before it leaves your computer using AES-256, a government and industry standard;
  • your backups are stored in an open, documented format; they've also delivered an open-source command-line utility called arq_restore, which is hosted at github, so you can read your data anytime without depending on Haystack Software in the event they they suddenly do a vanishing act: in other words, whatever happens, your data remains yours.
Arq application interface showing the folders chosen for backup to Amazon S3
Arq application interface showing the folders chosen for backup to Amazon S3: ususally, backing up your User folder, Applications and Application Support files will suffice to restore everything you need.

Arq's accuracy is on a par with Jungle Disk's, without the hassle: its Backup Bouncer results show it passes all tests, and it resides in a small, inconspicuous menu icon, from which you can launch the main application if you need to change a setting. It also has the advantage, over Jungle Disk, of being designed from the ground as a backup solution for the Mac, whereas Jungle Disk's backup capacity is just one of a number of services, including mirroring files between client and server, that it offers on both Mac and Windows platforms—with the Mac often giving the impression of being the poor relation.

Enquiries to the developer behind Haystack Software, Stefan Reitsamer, yielded extremely prompt and relevant responses and the site has a lively forum and detailed, well thought-out documentation. Above all, my entire backup for the MacBook Air that currently serves as my only computer was complete in less than a day (and that was on a rather poor, French Internet connection).

Used in combination with Dropbox, Google Apps and, optionally, MobileMe, Amazon S3 can get you back working exactly where you were if something unexpected happens to your Mac

I continue to use Dropbox syncs for any data that I need to access simultaneously from several devices, whether computers, iPhones or iPads: this means that my Dropbox copies of this data (essentially my 1Password data, documents, photographs and music) can be regarded as securely backed up. Better still, Dropbox keeps a thirty-day history of every change you make so that you can undo any mistakes and even undelete files if required.

A fully-fledged backup system with Arq, Dropbox, Google Apps and MobileMe
A fully-fledged backup system with Arq, Dropbox, Google Apps and MobileMe: a combination of all these services will back up all the data needed to bring a Mac—after the original system is restored— exactly back to where it was before data loss.

The combination of these two simple tools mean that if I lost access to my computer, its replacement could be rapidly running again with an effectively identical file system using a combination of the actual backups created via Arq and the files stored in Dropbox, while emails and calendar items would be safe in Google Apps and Contacts in MobileMe (these choices are explained in an earlier post).

An Arq licence for one computer costs $29, with a five-computer option available at $59. Amazon's very reasonable monthly charge comes on top of that. You can cap it within Arq's preferences to whatever suits you, and you'll be kept within budget by deleting your oldest backups as the new ones come in.

_______________
  1. This is something that has happened before, as testified by the examples of Upline, XDrive, Omnidrive and Digital Railroad. []