With the new version of OS X (Leopard) Apple has included some great functionality in Time Machine. Your Mac will automatically backup to an external drive every hour. It includes the ability to recover deleted files in a timeline. The one downside to the Time Machine approach is that the data isn’t remotely stored. A couple years ago my wife and I had a house fire where most of our things were destroyed. Fortunately the fire was extinguished before it spread to where our computers were so we didn’t lose any data. If it had been elsewhere in the house it could have been a serious situation for us if we lost all of our digital files.
After the fire I have followed a manual process of backing up our files on an external drive that I store in our fire safe. The problem with this is it requires me to actually do the work, which I often put-off. When Amazon S3 was introduced I immediately saw the potential to use it as an automatic remote backup source. I hadn’t invested much time in it up until now, but I just got a new computer (MacBook Air!!) and while setting it up I thought it would be a good opportunity to get my backup situation in order.
There are some great tools already in existence that can do most of the heavy lifting for you. The primary tool for doing remote directory syncs is called s3sync which is a script written in Ruby. Lucky for us OS X comes with Ruby pre-installed so there isn’t much work to get it working.
Here is my step-by-step guide to getting your machine setup to do automatic daily backups to Amazon. I developed these steps on my MacBook Air running Leopard however they should work for previous versions of OS X as well.
Step 1) First off, your going to need and Amazon Web Services account. Head over to http://aws.amazon.com/ and sign-up for an account to use S3. The prices are very cheap ($0.15/GB/Month). Once you have your account setup you will need two things to use Amazon S3. Your Amazon access key and your secret key. These are what s3sync will use to authenticate you to Amazon.
Step 2) I’ve packaged together a zip file with all the files you are going to need to get this setup along with SSL. Download the file at http://images.vallery.net/s3backup.zip. You can go to http://s3sync.net/ to see if a newer version if you like but you’ll need to figure some of this out on your own.
Step 3) You need to create a “bucket” in amazon to store your files. A bucket is similar to a folder, however it is globally uniquely named across all Amazon S3 users. In order to create the bucket you are going to need one of the S3 GUI applications that exist. I have included in the zip file the one I have used called “S3 Browser”. You can find the latest version at http://people.no-distance.net/ol/software/s3/. Once you launch S3 browser click on “connection” then “new connection”. You’ll need to provide the access details you got from Amazon in step 1. Once you have connected click the “Add” button which will allow you to create a new bucket. Because the name has to be globally unique I used “vallery-macbookair-backup” where vallery is my last name. Keep track of this bucket name because you need it in the next step.

Step 4) Once you have the zip file I created downloaded it should automatically extract itself into your downloads folder creating a new folder called “s3backup”. Within the s3backup folder are all the files and scripts you will need in order to get this working. There is one key file that needs to be edited in order to make this all work which is called “backup.sh”. Open the file “backup.sh” and replace the place holder access key, secret key, bucket name with the ones you obtained form Amazon and step 3.

Step 5) Now that you have all the files ready to go you need to select a place to store them. The application will run as root at the system level in order to prevent file access issues, therefore I recommend storing the entire s3backup folder in your /Library folder. You should copy the entire folder using finder to /Library. There are a few other paths in “backup.sh” that will need to be updated if you choose to store the file elsewhere.
Step 6) You need to setup your Mac to automatically run the backup shell script on a regular interval. There are a couple ways to do this. Since I am Unix guy I immediately started looking at cron. I discovered however that Apple recommends you use launchd for scheduled tasks. It is fairly complex to setup a scheduled task using launchd but thankfully someone has already created a simple GUI that will let you do it. The application Lingon can be used for this. I’ve included the latest version at the time of writing this in the s3backup directory but you can always obtain the latest version from http://lingon.sourceforge.net/. Once you have launched Lingon you need provide some information. Click the “New” button to start a new agent. Choose “Users Daemons” so that the script will run as root and have access to all of the users on your Mac. Once you have created your new daemon you need to give it a name. I recommend something like com.vallery.s3backup where vallery is your name. You need to give the command line action for what to execute. Again, this assumes that you have stored the s3backup folder in /Library. Enter: “/bin/bash /Library/s3backup/backup.sh > /dev/null”. Lastly you need to give it a schedule as to when to run. I have mine setup to “At a specific date” with “Every day” selected and the time set to 4:00am. This is great if your leave your Mac on all the time. You might select a different option so that you can make sure your Mac isn’t in use when it is doing the backup. Click the “Save” button. It will require you to type in your admin password and then restart your computer.

That is it, your system should run the first backup as schedule. It will take a long time initially as the upload speed is limited to your internet connection. Once the initial upload has taken place it will only upload files that are new or have changed going forward. The script is setup to backup everything in the /Users folder. If you would like to limit what is being backed up you can change this to something else.
In the unfortunate event you actually need to get data out of the s3 store there are a number of applications that you can use to do this. Initially I have been using Panic’s Transmit however it seems to have problems with the way s3sync is storing the data. I found another great free app called “S3 Browser” which has worked well for me. You can also use the Firefox plugin S3 Fox.






Great guide. I’ve setup something similar to what you’ve described above but with a few differences:
1. Instead of s3sync’ing directly to s3, I sync an rdiff-backup location. Prior to running s3sync –delete –ssl /foo/bar mybucket:/ I run: rdiff-backup /Volumes/somewhere /foo/bar
This allows me to store incremental backups remotely on s3. The downside is that I have 2 local copies of the same data to be backed up.
2. I use Automator and iCal to schedule my backups instead of launchd. The reason that I did this was because I wanted an easy way of managing the backup schedule, and I wanted to have a visible indication as to whether my backup had succeeded or failed. I used Automator to add a Growl notification on success. Ideally, I would have wanted success or failure to appear in Console.app but I wasn’t able to get this to work.
The other thing I wanted to point out was that the –delete flag can be dangerous. If the location that you are backing up becomes empty for whatever reason, then the –delete flag will replicate that remotely.
There was a 3rd thing
I set S3SYNC_NATIVE_CHARSET to UTF-8. I have several files whose names contain non-ascii characters, and the result of not setting S3SYNC_NATIVE_CHARSET means that I have mangled filenames.
I use Jungledisk (www.jungledisk.com) to access my Amazon S3 store. Scheduled incremental backups, archiving of overwritten or deleted files, integration into Finder, the features are awesome.
Thanks for a great guide. I’m a relative Mac newbie and frankly like the idea of .mac integration, but don’t like it’s pricing. I was hoping to duplicate the remote storage functionality some way or other and with your guide, and the other resources posted here, I now can. Thanks so very much.
Nice posting! Thanks
I use JungleDisk too. Access from MAC, Windows & Unix my S3 Drive. In windows mapped as J:\ and in MAC in finder. Havent tried the auto backup yet
JungleDisk is a much simpler solution and is $25 (one time) for as many computers as you have. Each can access the S3 back-up and it’s as simple as a mounted disk. Automatic backup saves the latest version of all 83GB of my data and amazon charges me just about $12/month for all that storage.
Plus, all the content from Jungledisk can be encrypted, so in addition to the standard SSL transfer, my data on the server is encrypted so not even Amazon would be able to see it. It’s great!
but the data is locally cached at jungledisk’s servers. even with the encryption, which can be unlocked with a backdoor key that is easy to program in.
I like the idea of online storage but I cannot stand the idea of someone else storing it.
So I’ve tried this script a few times tonight, but every time I try to run backup.sh from the terminal, I get “Null Stream Error:” (or two) and then the product stops after a minute or so – and nothing’s syncing.
Eh?
“but the data is locally cached at jungledisk’s servers. even with the encryption, which can be unlocked with a backdoor key that is easy to program in.”
sorry there are a few very big misstatements/misunderstandings of fact in that statement
1. “locally cached” means *locally* i.e. on your pc. . .there is no caching on jungledisk’s servers, it caches on your local computer (or not, depending on how you set it up).
2. “backdoor key” – I, the NSA, and the rest of the world’s internet users would love the insider secret on the backdoor key to 256-bit AES encryption. . .i eagerly await your follow up
3. “easy to program in” – jungle disk’s encryption mechanism is published under GPL. Anyone familiar with security knows that the (with few excpetions) world’s strongest encryption routines are published under GPL or similar public domain/open source (i.e. truecrypt to name one) for the express reason being that making it public will allow it to be beaten on and stregthened better and faster than any closed source product could ever try to replicate (e.g. linux by virtue of its open source nature “fixes” itself ahead of the curve vs. windows patching itself only after users break it “in the wild”).
Go to http://www.grc.com and look up the episode(s) where Steve Gibson review Jungle Disk if you want to hear one of (if not *the*) pre-eminent internet security expert dissect and analyze it.
disclaimers nill – no affiliation with jungledisk, grc, or anything else. . .
-KK
if i want to backup a particular folder and not all /User folder. for example, I just want to back up my photos. Which script I need to modify. Thanks
Also,
How can i use the ICal and Automator with S3Sync? quote on comment #2
I get an error message, viz:
/Library/s3backup idoneus-computer-583>sudo ./backup.sh
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:590:in `connect’: certificate verify failed (OpenSSL::SSL::SSLError)
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:590:in `connect’
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:557:in `do_start’
from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:552:in `start’
from ./S3_s3sync_mod.rb:55:in `make_http’
from ./s3try.rb:62:in `S3tryConnect’
from ./s3try.rb:69:in `S3try’
from /Library/s3backup/s3sync.rb:284:in `s3TreeRecurse’
from /Library/s3backup/s3sync.rb:345:in `main’
from ./thread_generator.rb:79:in `call’
from ./thread_generator.rb:79:in `initialize’
from ./thread_generator.rb:76:in `new’
from ./thread_generator.rb:76:in `initialize’
from /Library/s3backup/s3sync.rb:266:in `new’
from /Library/s3backup/s3sync.rb:266:in `main’
from /Library/s3backup/s3sync.rb:724
/Library/s3backup idoneus-computer-584>
Any suggestions?
Great little tutorial. Very useful.
I managed to research and solve the problem where neither S3Fox nor Transmit would show the files once they’d been uploaded to S3. The problem is in the prefixed slash for the bucket. Remove it (i.e. the last character in the line where you execute s3sync.rb) and you’ll be able to see your files.
Any way to exclude a folder or two? I don’t want to continuously back up my “Movie downloads” folder to S3 – but man, what a great script and tutorial these are! Thanks!
Great tutorial – will this allow the backup to happen when given computer is at login screen? Since it’s a launch daemon?
Thanks!
Thanks for this great tutorial! I wanted to make one note about how you said you had been storing backup hard drives in a fire safe – NOBODY SHOULD DO THIS! Fire safes are rated for PAPER and nothing else. Any photos, film, or electronics will be ruined. A sheet of paper can survive much higher temperatures than a hard drive. Back up “in the cloud” (still not sure what the difference between the cloud and the internet is) or put your hard drive in a safety deposit box.
This is great. Thanks. How do you know if it is working? My S3 usage report shows activity but how can I be sure I can recover from there?
Veterinary medicine can be a difficult career to enter, with only 28 schools inside the country plus
an average graduation class of 90 people, thus often attracting competitive overachiever personalities who will be
their greatest critics. During vet school, little is addressed regarding
juggling financial aspects of running a practice or anything outside
with the technical core complexities of clinical veterinary medicine.
Finally, here is a set of questions you need to ask prior to
deciding to choose a veterinary hospital:.