Automatically backup your Mac to Amazon S3

February 1, 2008 – 11:46 pm

With the new version of OS X (Leopard) Apple has included some great functionality in Time Machine. Your Mac will automatically backup to an external drive every hour. It includes the ability to recover deleted files in a timeline. The one downside to the Time Machine approach is that the data isn’t remotely stored. A couple years ago my wife and I had a house fire where most of our things were destroyed. Fortunately the fire was extinguished before it spread to where our computers were so we didn’t lose any data. If it had been elsewhere in the house it could have been a serious situation for us if we lost all of our digital files.

After the fire I have followed a manual process of backing up our files on an external drive that I store in our fire safe. The problem with this is it requires me to actually do the work, which I often put-off. When Amazon S3 was introduced I immediately saw the potential to use it as an automatic remote backup source. I hadn’t invested much time in it up until now, but I just got a new computer (MacBook Air!!) and while setting it up I thought it would be a good opportunity to get my backup situation in order.

There are some great tools already in existence that can do most of the heavy lifting for you. The primary tool for doing remote directory syncs is called s3sync which is a script written in Ruby. Lucky for us OS X comes with Ruby pre-installed so there isn’t much work to get it working.

Here is my step-by-step guide to getting your machine setup to do automatic daily backups to Amazon. I developed these steps on my MacBook Air running Leopard however they should work for previous versions of OS X as well.

Step 1) First off, your going to need and Amazon Web Services account. Head over to http://aws.amazon.com/ and sign-up for an account to use S3. The prices are very cheap ($0.15/GB/Month). Once you have your account setup you will need two things to use Amazon S3. Your Amazon access key and your secret key. These are what s3sync will use to authenticate you to Amazon.

Step 2) I’ve packaged together a zip file with all the files you are going to need to get this setup along with SSL. Download the file at http://vallery.net/s3backup.zip. You can go to http://s3sync.net/ to see if a newer version if you like but you’ll need to figure some of this out on your own.

Step 3) You need to create a “bucket” in amazon to store your files. A bucket is similar to a folder, however it is globally uniquely named across all Amazon S3 users. In order to create the bucket you are going to need one of the S3 GUI applications that exist. I have included in the zip file the one I have used called “S3 Browser”. You can find the latest version at http://people.no-distance.net/ol/software/s3/. Once you launch S3 browser click on “connection” then “new connection”. You’ll need to provide the access details you got from Amazon in step 1. Once you have connected click the “Add” button which will allow you to create a new bucket. Because the name has to be globally unique I used “vallery-macbookair-backup” where vallery is my last name. Keep track of this bucket name because you need it in the next step.

s3browser.png

Step 4) Once you have the zip file I created downloaded it should automatically extract itself into your downloads folder creating a new folder called “s3backup”. Within the s3backup folder are all the files and scripts you will need in order to get this working. There is one key file that needs to be edited in order to make this all work which is called “backup.sh”. Open the file “backup.sh” and replace the place holder access key, secret key, bucket name with the ones you obtained form Amazon and step 3.

backupsh.png

Step 5) Now that you have all the files ready to go you need to select a place to store them. The application will run as root at the system level in order to prevent file access issues, therefore I recommend storing the entire s3backup folder in your /Library folder. You should copy the entire folder using finder to /Library. There are a few other paths in “backup.sh” that will need to be updated if you choose to store the file elsewhere.

Step 6) You need to setup your Mac to automatically run the backup shell script on a regular interval. There are a couple ways to do this. Since I am Unix guy I immediately started looking at cron. I discovered however that Apple recommends you use launchd for scheduled tasks. It is fairly complex to setup a scheduled task using launchd but thankfully someone has already created a simple GUI that will let you do it. The application Lingon can be used for this. I’ve included the latest version at the time of writing this in the s3backup directory but you can always obtain the latest version from http://lingon.sourceforge.net/. Once you have launched Lingon you need provide some information. Click the “New” button to start a new agent. Choose “Users Daemons” so that the script will run as root and have access to all of the users on your Mac. Once you have created your new daemon you need to give it a name. I recommend something like com.vallery.s3backup where vallery is your name. You need to give the command line action for what to execute. Again, this assumes that you have stored the s3backup folder in /Library. Enter: “/bin/bash /Library/s3backup/backup.sh > /dev/null”. Lastly you need to give it a schedule as to when to run. I have mine setup to “At a specific date” with “Every day” selected and the time set to 4:00am. This is great if your leave your Mac on all the time. You might select a different option so that you can make sure your Mac isn’t in use when it is doing the backup. Click the “Save” button. It will require you to type in your admin password and then restart your computer.

lingon.png

That is it, your system should run the first backup as schedule. It will take a long time initially as the upload speed is limited to your internet connection. Once the initial upload has taken place it will only upload files that are new or have changed going forward. The script is setup to backup everything in the /Users folder. If you would like to limit what is being backed up you can change this to something else.

In the unfortunate event you actually need to get data out of the s3 store there are a number of applications that you can use to do this. Initially I have been using Panic’s Transmit however it seems to have problems with the way s3sync is storing the data. I found another great free app called “S3 Browser” which has worked well for me. You can also use the Firefox plugin S3 Fox.

  1. 14 Responses to “Automatically backup your Mac to Amazon S3”

  2. Great guide. I’ve setup something similar to what you’ve described above but with a few differences:

    1. Instead of s3sync’ing directly to s3, I sync an rdiff-backup location. Prior to running s3sync –delete –ssl /foo/bar mybucket:/ I run: rdiff-backup /Volumes/somewhere /foo/bar
    This allows me to store incremental backups remotely on s3. The downside is that I have 2 local copies of the same data to be backed up.
    2. I use Automator and iCal to schedule my backups instead of launchd. The reason that I did this was because I wanted an easy way of managing the backup schedule, and I wanted to have a visible indication as to whether my backup had succeeded or failed. I used Automator to add a Growl notification on success. Ideally, I would have wanted success or failure to appear in Console.app but I wasn’t able to get this to work.

    The other thing I wanted to point out was that the –delete flag can be dangerous. If the location that you are backing up becomes empty for whatever reason, then the –delete flag will replicate that remotely.

    By Paul Grave on Feb 11, 2008

  3. There was a 3rd thing :) I set S3SYNC_NATIVE_CHARSET to UTF-8. I have several files whose names contain non-ascii characters, and the result of not setting S3SYNC_NATIVE_CHARSET means that I have mangled filenames.

    By Paul Grave on Feb 11, 2008

  4. I use Jungledisk (www.jungledisk.com) to access my Amazon S3 store. Scheduled incremental backups, archiving of overwritten or deleted files, integration into Finder, the features are awesome.

    By Edmund Blackadder on Feb 12, 2008

  5. Thanks for a great guide. I’m a relative Mac newbie and frankly like the idea of .mac integration, but don’t like it’s pricing. I was hoping to duplicate the remote storage functionality some way or other and with your guide, and the other resources posted here, I now can. Thanks so very much.

    By Steven M. Sawczyn on Feb 14, 2008

  6. Nice posting! Thanks :)

    By Aleks on Feb 18, 2008

  7. I use JungleDisk too. Access from MAC, Windows & Unix my S3 Drive. In windows mapped as J:\ and in MAC in finder. Havent tried the auto backup yet

    By Valid Character on Feb 18, 2008

  8. JungleDisk is a much simpler solution and is $25 (one time) for as many computers as you have. Each can access the S3 back-up and it’s as simple as a mounted disk. Automatic backup saves the latest version of all 83GB of my data and amazon charges me just about $12/month for all that storage.

    Plus, all the content from Jungledisk can be encrypted, so in addition to the standard SSL transfer, my data on the server is encrypted so not even Amazon would be able to see it. It’s great!

    By Jay on Mar 6, 2008

  9. but the data is locally cached at jungledisk’s servers. even with the encryption, which can be unlocked with a backdoor key that is easy to program in.

    I like the idea of online storage but I cannot stand the idea of someone else storing it.

    By rick on Mar 28, 2008

  10. So I’ve tried this script a few times tonight, but every time I try to run backup.sh from the terminal, I get “Null Stream Error:” (or two) and then the product stops after a minute or so - and nothing’s syncing.

    Eh?

    By Scott R on Apr 16, 2008

  11. “but the data is locally cached at jungledisk’s servers. even with the encryption, which can be unlocked with a backdoor key that is easy to program in.”

    sorry there are a few very big misstatements/misunderstandings of fact in that statement

    1. “locally cached” means *locally* i.e. on your pc. . .there is no caching on jungledisk’s servers, it caches on your local computer (or not, depending on how you set it up).

    2. “backdoor key” - I, the NSA, and the rest of the world’s internet users would love the insider secret on the backdoor key to 256-bit AES encryption. . .i eagerly await your follow up

    3. “easy to program in” - jungle disk’s encryption mechanism is published under GPL. Anyone familiar with security knows that the (with few excpetions) world’s strongest encryption routines are published under GPL or similar public domain/open source (i.e. truecrypt to name one) for the express reason being that making it public will allow it to be beaten on and stregthened better and faster than any closed source product could ever try to replicate (e.g. linux by virtue of its open source nature “fixes” itself ahead of the curve vs. windows patching itself only after users break it “in the wild”).

    Go to http://www.grc.com and look up the episode(s) where Steve Gibson review Jungle Disk if you want to hear one of (if not *the*) pre-eminent internet security expert dissect and analyze it.

    disclaimers nill - no affiliation with jungledisk, grc, or anything else. . .

    -KK

    By kozmo on Apr 17, 2008

  12. Information storage and network product solutions for us the end-user customers.

    Storage devices retain data even when the computer is turned off.

    These are the main types of mass storage:

    Floppy disks : Relatively slow and have a small capacity, but they are portable, inexpensive, and universal.
    Hard disks : Very fast and with more capacity than floppy disks, but also more expensive. Some hard disk systems are portable (removable cartridges), but most are not.
    Optical disks : Unlike floppy and hard disks, which use electromagnetism to encode data, optical disk systems use a laser to read and write data. Optical disks have very large storage capacity, but they are not as fast as hard disks. In addition, the inexpensive optical disk drives are read-only. Read/write varieties are expensive.
    Tapes : Relatively inexpensive and can have very large storage capacities, but they do not permit random access of data.

    By Data storage solutions on May 7, 2008

  13. if i want to backup a particular folder and not all /User folder. for example, I just want to back up my photos. Which script I need to modify. Thanks

    By brian on May 28, 2008

  14. Also,

    How can i use the ICal and Automator with S3Sync? quote on comment #2

    By brian on May 28, 2008

  15. I get an error message, viz:

    /Library/s3backup idoneus-computer-583>sudo ./backup.sh
    /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:590:in `connect’: certificate verify failed (OpenSSL::SSL::SSLError)
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:590:in `connect’
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:557:in `do_start’
    from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:552:in `start’
    from ./S3_s3sync_mod.rb:55:in `make_http’
    from ./s3try.rb:62:in `S3tryConnect’
    from ./s3try.rb:69:in `S3try’
    from /Library/s3backup/s3sync.rb:284:in `s3TreeRecurse’
    from /Library/s3backup/s3sync.rb:345:in `main’
    from ./thread_generator.rb:79:in `call’
    from ./thread_generator.rb:79:in `initialize’
    from ./thread_generator.rb:76:in `new’
    from ./thread_generator.rb:76:in `initialize’
    from /Library/s3backup/s3sync.rb:266:in `new’
    from /Library/s3backup/s3sync.rb:266:in `main’
    from /Library/s3backup/s3sync.rb:724
    /Library/s3backup idoneus-computer-584>

    Any suggestions?

    By Rolf Marvin Bøe Lindgren on Jun 1, 2008

Post a Comment